Which is fastest way to update rows by id in sqlite python?

420 views Asked by At

I'm using python sqlite3 package to maintain mid-size project containing more than 5 mil rows. At some moment I have to use external databases, that should update my rows by same id. Using classic cursor.execute taking so long time to execute so... here is the question - how to make it faster?

2

There are 2 answers

0
Yarik Shevtsov On

Here is my studying of optimizing update data info by id with python sqlite3.

There are 3 levels of performance:

  1. Basic

Basic execution level means you are using simple cursor.execute commands with sql and parameters inside this. But if you try this with any loop (for, while etc.):

to_sql=[[10,1],[20,2],[30,3]]
for val,id in to_sql:
    cursor.execute("UPDATE YOUR_TABLE SET val = ? WHERE id = ?",(val,id))
cursor.commit()

your performance will be very poor because we call execute command every time. Even updating 10000 rows will take like 1 hour time. And running this really annoying sometimes:)

  1. Advanced

Advanced level means you are using cursor.executemany commands with sql and list of tuples to commit. In that case, we are sending only one request, and an array of arguments in it. The code looks like:

to_sql = [(10,1),(20,2),(30,3)] #etc
cursor.executemany("UPDATE YOUR_TABLE SET val =? WHERE id = ?",to_sql)
cursor.commit()

Performance of this will be faster by 70% - 18 mins, in my case. But still that's not fast enough

  1. Best way

After all that executing's and understanding that maybe where is another way, solution found - updating from temporary table. Executing that kind of query takes only 5 sec to update 10000 rows!

to_sql=[(10,1),(20,2),(30,3)]
cursor.execute("CREATE TABLE Temp (value TEXT, id TEXT PRIMARY KEY)") #creating our temp table
cursor.executemany("INSERT INTO Temp VALUES(?,?)",to_sql) #inserting values and id what we want to update
cursor.execute("UPDATE YOUR_TABLE SET(column)=(Temp.value) FROM Temp WHERE YOUR_TABLE.id = Temp.id")
#That's where magic - we doing all executing in one request and inside sqlite3 interface
cursor.execute("DROP TABLE TEMP") - If we are going to update many tables in one connection we must create and delete our Temp table
cursors.commit()

Conclusion

By now, way-3 is best in performance to update values by id. By some reason, i didn't find step-by-step explanation like above, and that's why i decided to write that little research. In addition to performance article, we can also use this tweaks after connecting to database:

sql_con=sqlite3.connect("YOUR_DATABASE_NAME.db")
sql_con.cursor.execute("PRAGMA cache_size = 100000") # means we will set some cache size for our database which further increase our execution performance
sql_con.cursor.execute("PRAGMA temp_store = MEMORY") # means we will store temporary info in RAM instead of hard disk

More detailed info about pragma you can find here:

https://www.sqlite.org/pragma.html#pragma_cache_size

https://www.sqlite.org/pragma.html#pragma_temp_store

0
Serge Ballesta On

Like any other relational db engines, SQLite can use indexes to speed up queries. When you create a table with a primary key, that column automatically gets an index on it. When you later use the update query via the temp table, SQLite internally joins both table using the primary key index of TEMP. So it only performs one single full scan on YOUR_TABLE and then for each record from YOUR_TABLE gets the corresponding record from TEMP via the index.

Said differently, if you have to often locate records via one column (or a limited number of columns) you should ensure that the table has an index on those columns.