theme selector

light blue screenshot grey screenshot navy screenshot dark green screenshot red and black screenshot
 

by Tony Chang
tony@ponderer.org

All opinions on this site are my own and do not represent those of my employer.

Creative Commons Attribution License

python log reader

Mar 05, 2006, 11:37pm EST

 

 

I often want to do some quick analysis of my apache log files and find myself using grep or awk on my log files. I normally have to carefully construct the appropriate regular expression for getting the data I want. I’ve done this frequently enough that I decided it was time to write a python module to do it for me.

log_reader - a fast apache log reader in python (download)

Example:

>>> import log_reader
>>> reader = log_reader.ApacheReader(file(‘access.log’))
>>> reader.next()
{‘username’: ‘-‘, ‘status’: 200, ‘ident’: ‘-‘, ‘tz’:
‘-0500’, ‘protocol’: ‘HTTP/1.0’, ‘user-agent’: ‘Mozilla/4.0
(compatible; MSIE 6.0; Windows 98; iTreeSurf 3.6.1 (Build
056))’, ‘ips’: [‘123.123.123.123’], ‘referer’: ‘Field blocked
by Outpost (http://www.agnitum.com)’, ‘time’:
datetime.datetime(2005, 3, 3, 21, 37, 58), ‘path’:
‘/webnote/webnote’, ‘method’: ‘GET’, ‘size’: 46472}
>>> status = [f[‘status’] for f in reader]
>>> status.count(200) # request ok
14047
>>> status.count(404) # file not found
159

It’s implemented as a CPython module so it’s substantially faster than trying to read/parse strings in python itself. A simple script reading a 17,089 line log file takes about 1.45 seconds on my 1.2ghz laptop.

The constructor takes either a filename or any iterable as a parameter.[1] Optionally, one can pass in an apache format string (defaults to apache combined format).

[1] Oddly, sequences fail PyIter_Check and need to be wrapped by iter. That is, log_reader.ApacheReader([‘..’]) fails, but log_reader.ApacheReader(iter([‘..’])) works. I’m not sure what the distinction is because lists are iterable and have __iter__ defined.

Wade Leftwich at May 10, 2006, 09:14am EDT

Excellent utility, thanks!

Regarding why log_reader.ApacheReader([‘ .. ‘]) fails — sequences don’t have a next() method:

In [1]: L = [1,2,3]

In [2]: L.next()

exceptions.AttributeError Traceback (most recent call last)

AttributeError: ‘list’ object has no attribute ‘next’

In [3]: iter(L).next()

Out[3]: 1