Cypher performance on matching million users

79 views Asked by At

I am using redisgraph and the query is simple. How do I make it faster for getting a list of countries like that?

> GRAPH.profile g "MATCH (u:user) return collect(distinct u.countryCode) as codes" 
1) "Results | Records produced: 1, Execution time: 0.001353 ms"
2) "    Aggregate | Records produced: 1, Execution time: 238.989679 ms"
3) "        Node By Label Scan | (u:user) | Records produced: 833935, Execution time: 81.158457 ms"
1

There are 1 answers

0
Vincent Rupp On

Here's what your query is doing:

  1. Finding every user node in the graph. If you have these nodes indexed, then it'll be faster, but index lookups are always slower than graph traversals. You're doing 833,935 index lookups in 81ms.
  2. Looking up every country code on each node. Property retrieval also takes time, but the bulk of the time here is dropping duplicate records. There are only 180 or so countries, so about 833k of those user nodes didn't contribute to your end result. This took 239ms.
  3. Returning results: super fast.

I don't see a great way to speed this up, with the graph designed as is. Make sure user nodes and countryCode are indexed though. You could consider splitting out Country as its own node type, and then you can just match (c:Country). However, you run the risk of creating dense nodes because the USA, for example, probably has more users than Albania.

If you're going to need a list of country codes often and you can't alter the graph, then you could look at trickier things like adding a :FirstInCountry label to :user nodes or setting node ids as like 10000 - 10180 for unique country code user sets.

Edit: I said the wrong thing originally. The initial :user lookup is based on the label store, so an index there is irrelevant.