Is there a Python file type for accessing random lines without traversing the whole file? I need to search within a large file, reading the whole thing into memory wouldn't be possible.
Any types or methods would be appreciated.
Is there a Python file type for accessing random lines without traversing the whole file? I need to search within a large file, reading the whole thing into memory wouldn't be possible.
Any types or methods would be appreciated.
On
Since lines can be of arbitrary length, you really can't get at a random line (whether you mean "a line whose number is actually random" or "a line with an arbitrary number, selected by me") without traversing the whole file.
If kinda-sorta-random is enough, you can seek to a random place in the file and then read forward until you hit a line terminator. But that's useless if you want to find (say) line number 1234, and will sample lines non-uniformly if you actually want a randomly chosen line.
On
Yes, you can easily get a random line. Just seek to a random position in the file, then seek towards the beginning until you hit a \n or the beginning of the file, then read a line.
Code:
import sys,random
with open(sys.argv[1],"r") as f:
f.seek(0,2) # seek to end of file
bytes = f.tell()
f.seek(int(bytes*random.random()))
# Now seek forward until beginning of file or we get a \n
while True:
f.seek(-2,1)
ch = f.read(1)
if ch=='\n': break
if f.tell()==1: break
# Now get a line
print f.readline()
On
You can use linecache:
import linecache
print linecache.getline(your_file.txt, randomLineNumber) # Note: first line is 1, not 0
This seems like just the sort of thing
mmapwas designed for. Ammapobject creates a string-like interface to a file:In case you were wondering,
mmapobjects can also be assigned to: