How to approximate execution time of ArangoDB count function

378 views Asked by At

I am considering using ArangoDB for a new project of mine, but I have been unable to find very much information regarding its scalability.

Specifically, I am looking for some information regarding the count function. Is there a reliable way (perhaps a formula) to approximate how long it will take to count the number of documents in a collection which match a simple Boolean value?

All documents in the collection would have the same fields, however with different values. How can I determine how long would it take to count several hundred million documents?

1

There are 1 answers

0
Ingo On BEST ANSWER

Just create a collection users and insert as many random documents as you need.

FOR i IN 1..1100000
INSERT { 
  name: CONCAT("test", i), 
  year: 1970 + FLOOR(RAND() * 55),
  gender: i % 2 == 0 ? 'male' : 'female'  
} IN users

Then do the count:

FOR user IN users
  FILTER user.gender == 'male'
  COLLECT WITH COUNT INTO number
RETURN { 
  number: number 
}

And if you use this query in production, make sure to add an index too. On my machine it reduces the execution time by factor > 100x (0.043 sec / 1.1mio documents).

Check your query with EXPLAIN to further estimate how "expensive" the execution will be.

Query string:
 FOR user IN users
   FILTER user.gender == 'male'
   COLLECT WITH COUNT INTO number
   RETURN { 
     number: number 
   }

Execution plan:
 Id   NodeType            Est.   Comment
  1   SingletonNode          1   * ROOT
  8   IndexRangeNode    550001     - FOR user IN users   /* hash index scan */
  5   AggregateNode          1       - COLLECT  WITH COUNT INTO number   /* sorted*/
  6   CalculationNode        1       - LET #4 = { "number" : number }   /* simple expression */
  7   ReturnNode             1       - RETURN #4

Indexes used:
 Id   Type   Collection   Unique   Sparse   Selectivity Est.   Fields     Ranges
  8   hash   users        false    false              0.00 %   `gender`   [ `gender` == "male" ]

Optimization rules applied:
 Id   RuleName
  1   use-index-range
  2   remove-filter-covered-by-index