Using strace to determine progress when processing a file

I thought this was interesting, as I stumbled upon it. I find the most useful things often result from stumbling 😉

I had a job running that was gunzip’ing a file and piping the output to a python script. The unzipped file was a very large HTTP access log, so it was sorted by time as that is how it is created. I found that running strace against the python process and looking for read() system calls would show the first few bytes of each line, which included the timestamp. By using this, I could tell where I was in terms of processing the file.

-bash-4.1$ ps -fC bash | grep gunzip
sa-jboss  9351  9350  0 22:15 ?        00:00:00 bash -c gunzip -c /var/log/jbossas/logs2/localhost_access_log.2013-11-29.log.gz | python lifetime2.py
-bash-4.1$ strace -p $(pgrep python) -e trace=read 2>&1 | grep "/2013:"
read(0, "v/2013:09:10:56 -0500] \"POST /ch"..., 8192) = 8192
read(0, "ov/2013:09:10:56 -0500] \"POST /c"..., 8192) = 8192
read(0, ".186.39 - [29/Nov/2013:09:10:57 "..., 8192) = 8192
read(0, ".0.1 - [29/Nov/2013:09:10:58 -05"..., 8192) = 8192
read(0, ".0.1 - [29/Nov/2013:09:11:02 -05"..., 8192) = 8192
read(0, "[29/Nov/2013:09:11:07 -0500] \"PO"..., 4096) = 4096
read(0, "29/Nov/2013:09:11:09 -0500] \"POS"..., 8192) = 8192
read(0, "[29/Nov/2013:09:11:11 -0500] \"PO"..., 4096) = 4096
read(0, "5 6\n24.205.72.10 - [29/Nov/2013:"..., 4096) = 4096
read(0, "0\n98.14.99.208 - [29/Nov/2013:09"..., 8192) = 8192
read(0, "3.138.165 - [29/Nov/2013:09:11:1"..., 8192) = 8192
read(0, "Nov/2013:09:11:17 -0500] \"POST /"..., 8192) = 8192
read(0, ".131.87.120 - [29/Nov/2013:09:11"..., 8192) = 8192
-bash-4.1$

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.