I have a text file that has 120000 lines, where every line has exactly this format: ean_code;plu;name;price;state
I tried various operations, including working with file straight away, and best results were given if file was just loaded in memory line by line with readlines() and written to list (at the start of the program).
so i have these 2 lines:
matcher = re.compile('^(?:'+eanic.strip()+'(?:;|$)|[^;]*;'+eanic.strip()+'(?:;|$))').match
line=[next(l.split(';') for l in list if matcher(l))]
do sth with line....
What these lines are trying to accomplish is, they are trying to find (as fast as possible) a plu/ean, which was given by user input in fields: ean_code or plu.
I am particulary interested in second line, as it impacts my performance on WinCE device (PyCE port of python 2.5).
I tried every possible solution there is to make it faster, but this is fastest way to iterate through a certain list to find a match that re.compile is generating.
Any faster way other than for in list comprehension to iterate over big list (120000 lines in my case)?
I am looking for any kind of way possible with any kind of data structure (that is supported until Python 2.5) that will give me faster result than above two lines...
Just to mention, that this is performed on Handheld device (630MHz ARM), with 256MB RAM, and without any kind of connection besides USB is present. Sadly, database access and existance is not an option.
I made a test file and tested a few variations. The fastest way of searching for a static string (as you appear to be doing) by iterating over the file is by using
string in line
.However, if you'll be using the loaded data to search more than once (actually more than 30 times according to the testnumbers below), it's worth your (computational) time to produce lookup tables for the PLUs and EANs in the form of
dicts
and use these for future searches.Test code follows: