{"id":3404,"date":"2014-01-03T10:47:22","date_gmt":"2014-01-03T15:47:22","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=3404"},"modified":"2014-01-03T10:47:22","modified_gmt":"2014-01-03T15:47:22","slug":"using-strace-to-determine-progress-when-processing-a-file","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2014\/01\/03\/using-strace-to-determine-progress-when-processing-a-file\/","title":{"rendered":"Using strace to determine progress when processing a file"},"content":{"rendered":"<p>I thought this was interesting, as I stumbled upon it.  I find the most useful things often result from stumbling \ud83d\ude09<\/p>\n<p>I had a job running that was gunzip&#8217;ing a file and piping the output to a python script.  The unzipped file was a very large HTTP access log, so it was sorted by time as that is how it is created.  I found that running strace against the python process and looking for read() system calls would show the first few bytes of each line, which included the timestamp.  By using this, I could tell where I was in terms of processing the file.<\/p>\n<pre lang=\"text\">\r\n-bash-4.1$ ps -fC bash | grep gunzip\r\nsa-jboss  9351  9350  0 22:15 ?        00:00:00 bash -c gunzip -c \/var\/log\/jbossas\/logs2\/localhost_access_log.2013-11-29.log.gz | python lifetime2.py\r\n-bash-4.1$ strace -p $(pgrep python) -e trace=read 2>&1 | grep \"\/2013:\"\r\nread(0, \"v\/2013:09:10:56 -0500] \\\"POST \/ch\"..., 8192) = 8192\r\nread(0, \"ov\/2013:09:10:56 -0500] \\\"POST \/c\"..., 8192) = 8192\r\nread(0, \".186.39 - [29\/Nov\/2013:09:10:57 \"..., 8192) = 8192\r\nread(0, \".0.1 - [29\/Nov\/2013:09:10:58 -05\"..., 8192) = 8192\r\nread(0, \".0.1 - [29\/Nov\/2013:09:11:02 -05\"..., 8192) = 8192\r\nread(0, \"[29\/Nov\/2013:09:11:07 -0500] \\\"PO\"..., 4096) = 4096\r\nread(0, \"29\/Nov\/2013:09:11:09 -0500] \\\"POS\"..., 8192) = 8192\r\nread(0, \"[29\/Nov\/2013:09:11:11 -0500] \\\"PO\"..., 4096) = 4096\r\nread(0, \"5 6\\n24.205.72.10 - [29\/Nov\/2013:\"..., 4096) = 4096\r\nread(0, \"0\\n98.14.99.208 - [29\/Nov\/2013:09\"..., 8192) = 8192\r\nread(0, \"3.138.165 - [29\/Nov\/2013:09:11:1\"..., 8192) = 8192\r\nread(0, \"Nov\/2013:09:11:17 -0500] \\\"POST \/\"..., 8192) = 8192\r\nread(0, \".131.87.120 - [29\/Nov\/2013:09:11\"..., 8192) = 8192\r\n-bash-4.1$\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I thought this was interesting, as I stumbled upon it. I find the most useful things often result from stumbling \ud83d\ude09 I had a job running that was gunzip&#8217;ing a file and piping the output to a python script. The&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2014\/01\/03\/using-strace-to-determine-progress-when-processing-a-file\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[14],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3404"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=3404"}],"version-history":[{"count":5,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3404\/revisions"}],"predecessor-version":[{"id":3409,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/3404\/revisions\/3409"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=3404"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=3404"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=3404"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}