I am experimenting on UPSERT-statements for multiple items (around 20 - 30k) using paython-arango and a aql query. Overall, this method seems to be much slower in python compared to NodeJS anyway but I tried two different methods and suprisingly the one I thought its faster is much slower and I try to understand why.
My initial method was to upsert every single item separatly with a statement like this:
aql = "UPSERT {_key: '" + data["_key"] + "'}"
aql += " INSERT " + doc
aql += " UPDATE " + doc
aql += " IN " + col_name
self.db.aql.execute(aql)
As I said before, this was slower than a similar NodeJS script in NodeJS maybe because NodeJS works async. Overall approx 45s für 27k entries (2.4 MB csv-dump). A lot of time for so little data.
Then I thought, one by one could be the problem and it could be better to send multiple items per query to the db like this:
aql = "FOR itm IN " + json.dumps(items)
aql += " UPSERT {_key: itm._key}"
aql += " INSERT itm"
aql += " UPDATE itm"
aql += " IN " + col_name
But actually, this was worse. Maybe 5 minutes and the ArangoDB-Backend was seriously slower and for a short time not even accessibly.
I am curious: Why is one single (ok, I used batches of 2, 10, 50, 100 items but no effect) aql-statement with FOR so much slower than 27k seperate requests? I expected otherwise. And is there a better way to perform multiple UPSERTs at once (with Python)?
Its a Windows-system with 2 Cores and 16 GB RAM. Reading performance is good but writing ...
UPSERT performs a nested query to look for a matching document. However, that does not explain why the single query performs so much worse than the multiple queries, unless of course you perform these queries asynchronously/in parallel. I would have to look into this in more detail.
However, since you use the
_key
to identify your document and yourINSERT
andUPDATE
statements look identical, you might be better off usingoverwriteMode
for theINSERT
instead (see https://www.arangodb.com/docs/stable/aql/operations-insert.html#query-options):But when working with a large number of documents you should consider using bulk document creation instead (https://www.arangodb.com/docs/stable/http/document-working-with-documents.html#create-multiple-documents) which also supports
overwriteMode
. However, I don't know if the python-arango driver supports these functions.