Is there a Python file type for accessing random lines without traversing the whole file? I need to search within a large file, reading the whole thing into memory wouldn't be possible.
Any types or methods would be appreciated.
Is there a Python file type for accessing random lines without traversing the whole file? I need to search within a large file, reading the whole thing into memory wouldn't be possible.
Any types or methods would be appreciated.
Since lines can be of arbitrary length, you really can't get at a random line (whether you mean "a line whose number is actually random" or "a line with an arbitrary number, selected by me") without traversing the whole file.
If kinda-sorta-random is enough, you can seek to a random place in the file and then read forward until you hit a line terminator. But that's useless if you want to find (say) line number 1234, and will sample lines non-uniformly if you actually want a randomly chosen line.
Yes, you can easily get a random line. Just seek to a random position in the file, then seek towards the beginning until you hit a \n or the beginning of the file, then read a line.
Code:
import sys,random
with open(sys.argv[1],"r") as f:
f.seek(0,2) # seek to end of file
bytes = f.tell()
f.seek(int(bytes*random.random()))
# Now seek forward until beginning of file or we get a \n
while True:
f.seek(-2,1)
ch = f.read(1)
if ch=='\n': break
if f.tell()==1: break
# Now get a line
print f.readline()
You can use linecache:
import linecache
print linecache.getline(your_file.txt, randomLineNumber) # Note: first line is 1, not 0
This seems like just the sort of thing
mmap
was designed for. Ammap
object creates a string-like interface to a file:In case you were wondering,
mmap
objects can also be assigned to: