python log reader
Mar 05, 2006, 11:37pm EST
I often want to do some quick analysis of my apache log files and find myself using grep or awk on my log files. I normally have to carefully construct the appropriate regular expression for getting the data I want. I’ve done this frequently enough that I decided it was time to write a python module to do it for me.
log_reader - a fast apache log reader in python (download)
Example:
>>> import log_reader >>> reader = log_reader.ApacheReader(file(‘access.log’)) >>> reader.next() {‘username’: ‘-‘, ‘status’: 200, ‘ident’: ‘-‘, ‘tz’: ‘-0500’, ‘protocol’: ‘HTTP/1.0’, ‘user-agent’: ‘Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; iTreeSurf 3.6.1 (Build 056))’, ‘ips’: [‘123.123.123.123’], ‘referer’: ‘Field blocked by Outpost (http://www.agnitum.com)’, ‘time’: datetime.datetime(2005, 3, 3, 21, 37, 58), ‘path’: ‘/webnote/webnote’, ‘method’: ‘GET’, ‘size’: 46472} >>> status = [f[‘status’] for f in reader] >>> status.count(200) # request ok 14047 >>> status.count(404) # file not found 159
It’s implemented as a CPython module so it’s substantially faster than trying to read/parse strings in python itself. A simple script reading a 17,089 line log file takes about 1.45 seconds on my 1.2ghz laptop.
The constructor takes either a filename or any iterable as a parameter.[1] Optionally, one can pass in an apache format string (defaults to apache combined format).
[1] Oddly, sequences
fail PyIter_Check and
need to be wrapped by iter. That is,
log_reader.ApacheReader([‘..’]) fails, but
log_reader.ApacheReader(iter([‘..’])) works. I’m not sure what
the distinction is because lists are iterable and have __iter__
defined.
Wade Leftwich at May 10, 2006, 09:14am EDT
Excellent utility, thanks!
Regarding why log_reader.ApacheReader([‘ .. ‘]) fails — sequences don’t have a next() method:
In [1]: L = [1,2,3]
In [2]: L.next()
exceptions.AttributeError Traceback (most recent call last)
AttributeError: ‘list’ object has no attribute ‘next’
In [3]: iter(L).next()
Out[3]: 1