What is the fastest way to get all documents of a collection?

708 views Asked by At

I have a problem. I want to get all documents of a collection with ~ 1 mio documents inside. I asked myself what is the fastest way to get all documents inside a collection. Is it with cursor or with .all? And are there any recommendation for the batch_size?

cursor

from arango import ArangoClient

# Initialize the ArangoDB client.
client = ArangoClient()

# Connect to database as  user.
db = client.db(<db>, username=<username>, password=<password>)

cursor = db.aql.execute('FOR doc IN <Collection> RETURN doc', stream=True, ttl=3600, batch_size=<batchSize>)
collection =  [doc for doc in cursor]

.all - with custom HTTP Client

from arango import ArangoClient
from arango.http import HTTPClient

class MyCustomHTTPClient(HTTPClient):
    REQUEST_TIMEOUT = 1000

# Initialize the ArangoDB client.
client = ArangoClient(
    http_client=MyCustomHTTPClient())

# Connect to database as  user.
db = client.db(<db>, username=<username>, password=<password>)

collec = db.collection('<Collection>')
collection = collec.all()
1

There are 1 answers

0
BenVida On BEST ANSWER

If you want all documents in the memory then the .all will be the fastest because it uses the library's method for getting all the results which is optimized.

If you can process each document as they come in then the cursor is the best way to do it to avoid the memory overhead.

But the best way to decide this is to run tests measure the timing because many factors can effect the speed, such as the connection type and speed to the DB, amount of memory in your computer, etc. The examples you gave look simple enough to do such measurements pretty fast.