{"id":4718,"date":"2015-02-17T14:59:33","date_gmt":"2015-02-17T19:59:33","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=4718"},"modified":"2015-02-17T14:59:33","modified_gmt":"2015-02-17T19:59:33","slug":"reading-random-lines-with-python","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2015\/02\/17\/reading-random-lines-with-python\/","title":{"rendered":"Reading random lines with python"},"content":{"rendered":"<p>Technically, this isn&#8217;t random, but it met my needs.  I wanted to read an arbitrary number of lines from a 4GB text file to spot check data we had loaded.<\/p>\n<p>What is below does the following:<\/p>\n<p>1) get the file size<br \/>\n2) open the file<br \/>\n3) get the size of file chunks we want to skip.  This is based on the size of the file divided by how many lines we want to read<br \/>\n4) In a loop, seek to the next byte position based on the current position plus the offset we calculated above<br \/>\n5) Read the rest of the line at that point, then the next complete line<br \/>\n6) Repeat<\/p>\n<p>What is below reads 10,000 lines from a file&#8230;<\/p>\n<pre>\r\nimport os\r\ns=os.stat(\"myfile.txt\")[6]\r\nf = open(\"myfile.txt\",\"r\")\r\ncount=int(s\/10000)\r\ni = 1\r\nwhile i < s:\r\n  f.seek(i + count)\r\n  f.readline()\r\n  tmp=f.readline().split(\"|\")\r\n  #do something with the tmp variable that stores the \"random\" line\r\n  i = f.tell()\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Technically, this isn&#8217;t random, but it met my needs. I wanted to read an arbitrary number of lines from a 4GB text file to spot check data we had loaded. What is below does the following: 1) get the file&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2015\/02\/17\/reading-random-lines-with-python\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[8],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/4718"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=4718"}],"version-history":[{"count":2,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/4718\/revisions"}],"predecessor-version":[{"id":4724,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/4718\/revisions\/4724"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=4718"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=4718"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=4718"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}