I'm looking at performing graph aggregate (groupBy,groupCount) queries across edges on a TitanGraph DB over two data sets:
About 10,000 nodes and about 1 million edges
About 200,000 nodes and about 1 billion edges
Does anyone know at what point I need to put in the effort to install Faunus to be able to do this type of gremlin query within say 1 minute?
At 10000 nodes and 1M edges, you shouldn't have problems with plain Gremlin (no Faunus). See the code below where I generate a graph of approximately that size using Furnace:
Recalling your post here on aggregates, I basically execute the same query on this data set.
As you can see, it takes about 1.5 seconds to do this traversal (it's a bout 500ms on TinkerGraph which is all in memory).
At 1B edges you will likely need Faunus. I don't think you would get through iteration of all those edges in under a minute even if you could fit it all in memory somehow. Note that with Faunus, you might not get 1 minute query/answer times. You will need to experiment a bit I think.