Parsing timestamps from Apache log files in Python

October 29, 2013 – tagged Apache, Programming

Timestamps in Apache access log files have, by default, the format

27/Oct/2013:06:33:40 +0100

This can not be parsed in Python using the strptime() function from the time/datetime modules because there is no %z placeholder in strptime() to match the timezone (only %Z). Also, using the parse() function from dateutil.parser does not work, because it fails to recognize the format and it is non-trivial to give a simple format string.

After looking for the "best" solution now for quite a while, here is probably the most elegant way to do it:

>>> from dateutil.parser import parse
>>> d = '27/Oct/2013:06:33:40 +0100'
>>> parse(d[:11] + " " + d[12:])
datetime.datetime(2013, 10, 27, 6, 33, 40,
                  tzinfo=tzoffset(None, 3600))