I wrote the following code to rewrite a text file in a given order. This order is specified in gA. gA is a list: [[fN0,value0],[fN1,value1] ...]. I sorted this list by value and want to write out respecting this order.
My code works fine, but is very slow on my input (I have an input with 50m rows and it would take 2 months to process it). Therefore, I am looking for ways to fasten this code. Any idea is welcome.
for k in gA:
fN = k[0]
for lineNum, line in enumerate(slicedFile,start=0):
num, restOfLine = line.split('\t',1)
if num == fN:
out.write(line)
inp.seek(0)
You should read the whole file into memory and put all lines in a
dictofnumpointing at alistoflines having thatnumin the beginning. Then you can iterate once through thegAand print all lines from thatdict:Note: I'm using
defaultdictjust to shorten the code. If a non-existing element is used in such adefaultdict, it gets created automatically (in this case alist), so I can just call.append()on the element.