Reading random lines with python

Technically, this isn’t random, but it met my needs. I wanted to read an arbitrary number of lines from a 4GB text file to spot check data we had loaded.

What is below does the following:

1) get the file size
2) open the file
3) get the size of file chunks we want to skip. This is based on the size of the file divided by how many lines we want to read
4) In a loop, seek to the next byte position based on the current position plus the offset we calculated above
5) Read the rest of the line at that point, then the next complete line
6) Repeat

What is below reads 10,000 lines from a file…

import os
s=os.stat("myfile.txt")[6]
f = open("myfile.txt","r")
count=int(s/10000)
i = 1
while i < s:
  f.seek(i + count)
  f.readline()
  tmp=f.readline().split("|")
  #do something with the tmp variable that stores the "random" line
  i = f.tell()

Post navigation

Leave a Reply