I'm using python sqlite3 package to maintain mid-size project containing more than 5 mil rows. At some moment I have to use external databases, that should update my rows by same id. Using classic cursor.execute taking so long time to execute so... here is the question - how to make it faster?
Which is fastest way to update rows by id in sqlite python?
420 views Asked by Yarik Shevtsov At
2
There are 2 answers
0
On
Like any other relational db engines, SQLite can use indexes to speed up queries. When you create a table with a primary key, that column automatically gets an index on it. When you later use the update query via the temp table, SQLite internally joins both table using the primary key index of TEMP. So it only performs one single full scan on YOUR_TABLE and then for each record from YOUR_TABLE gets the corresponding record from TEMP via the index.
Said differently, if you have to often locate records via one column (or a limited number of columns) you should ensure that the table has an index on those columns.
Here is my studying of optimizing update data info by id with python sqlite3.
There are 3 levels of performance:
Basic execution level means you are using simple cursor.execute commands with sql and parameters inside this. But if you try this with any loop (for, while etc.):
your performance will be very poor because we call execute command every time. Even updating 10000 rows will take like 1 hour time. And running this really annoying sometimes:)
Advanced level means you are using cursor.executemany commands with sql and list of tuples to commit. In that case, we are sending only one request, and an array of arguments in it. The code looks like:
Performance of this will be faster by 70% - 18 mins, in my case. But still that's not fast enough
After all that executing's and understanding that maybe where is another way, solution found - updating from temporary table. Executing that kind of query takes only 5 sec to update 10000 rows!
Conclusion
By now, way-3 is best in performance to update values by id. By some reason, i didn't find step-by-step explanation like above, and that's why i decided to write that little research. In addition to performance article, we can also use this tweaks after connecting to database:
More detailed info about pragma you can find here:
https://www.sqlite.org/pragma.html#pragma_cache_size
https://www.sqlite.org/pragma.html#pragma_temp_store