Reading random lines with python

Technically, this isn’t random, but it met my needs. I wanted to read an arbitrary number of lines from a 4GB text file to spot check data we had loaded.

What is below does the following:

1) get the file size
2) open the file
3) get the size of file chunks we want to skip. This is based on the size of the file divided by how many lines we want to read
4) In a loop, seek to the next byte position based on the current position plus the offset we calculated above
5) Read the rest of the line at that point, then the next complete line
6) Repeat

What is below reads 10,000 lines from a file…

import os
s=os.stat("myfile.txt")[6]
f = open("myfile.txt","r")
count=int(s/10000)
i = 1
while i < s:
  f.seek(i + count)
  f.readline()
  tmp=f.readline().split("|")
  #do something with the tmp variable that stores the "random" line
  i = f.tell()

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.