Parsing timestamps from Apache log files in Python
Timestamps in Apache access log files have, by default, the format
27/Oct/2013:06:33:40 +0100
This can not be parsed in Python using the strptime()
function from the time
/datetime
modules because there is no %z
placeholder in strptime()
to match the timezone (only %Z
). Also, using the parse()
function from dateutil.parser
does not work, because it fails to recognize the format and it is non-trivial to give a simple format string.
After looking for the "best" solution now for quite a while, here is probably the most elegant way to do it:
>>> from dateutil.parser import parse
>>> d = '27/Oct/2013:06:33:40 +0100'
>>> parse(d[:11] + " " + d[12:])
datetime.datetime(2013, 10, 27, 6, 33, 40,
tzinfo=tzoffset(None, 3600))