Skip to main content

Apache log shell scripts

Look for bytes returned > 1,000,000

We were looking for a bug that was dumping way too much data, and we needed a way to find records that returned more than a million bytes. David Choi figured this out.


cat access_log* | grep browseinst | awk -F\" ‘{ print $1" [“$2”] [“$6”] "$3 }’ | grep browseinst | awk ‘{if ($NF > 1000000) print $0}’

h2. Explanation

    Explanation

    1. cat access_log* – feed contents of all access_log files into next part of script
    2. grep browseinst – only return lines with “browseinst”
    3. awk -F\" ‘{ print $1" [“$2”] [“$6”] "$3 }’ – split the line into fields delimited by double-quotes, and then only print 1st, 2nd, 6th, and 3rd fields
    4. grep browseinst – _watch for lines with “browseinst” again because it could have shown up in the referrer field, which we don’t want
    5. awk ‘{if ($NF > 1000000) print $0}’ – _if last field is greater than 1 million, print out the last set of fields

(Note: try pulling it apart and build it back up, looking at the output at each step. That’s the only way I could make sense of it.)

Output


128.97.62.186 - - [14/Jul/2010:10:25:18 -0700] [GET /?page=browseinst&term=101&lastalpha=I&instructor=2 HTTP/HTTP/1.1] [Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 ( .NET CLR 3.5.30729)] 200 1026117
128.1026117128.97.198.33 - - [14/Jul/2010:22:48:16 -0700] [GET /?page=browseinst&term=101&lastalpha=T&instructor=1207888 HTTP/HTTP/1.1] [Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6] 200 1059649

1059649...