I have a problem. I want to get all documents of a collection with ~ 1 mio documents inside. I asked myself what is the fastest way to get all documents inside a collection. Is it with cursor
or with .all
? And are there any recommendation for the batch_size
?
cursor
from arango import ArangoClient
# Initialize the ArangoDB client.
client = ArangoClient()
# Connect to database as user.
db = client.db(<db>, username=<username>, password=<password>)
cursor = db.aql.execute('FOR doc IN <Collection> RETURN doc', stream=True, ttl=3600, batch_size=<batchSize>)
collection = [doc for doc in cursor]
.all
- with custom HTTP Client
from arango import ArangoClient
from arango.http import HTTPClient
class MyCustomHTTPClient(HTTPClient):
REQUEST_TIMEOUT = 1000
# Initialize the ArangoDB client.
client = ArangoClient(
http_client=MyCustomHTTPClient())
# Connect to database as user.
db = client.db(<db>, username=<username>, password=<password>)
collec = db.collection('<Collection>')
collection = collec.all()
If you want all documents in the memory then the
.all
will be the fastest because it uses the library's method for getting all the results which is optimized.If you can process each document as they come in then the
cursor
is the best way to do it to avoid the memory overhead.But the best way to decide this is to run tests measure the timing because many factors can effect the speed, such as the connection type and speed to the DB, amount of memory in your computer, etc. The examples you gave look simple enough to do such measurements pretty fast.