I have a query (link below) I must execute once per day or once per week in my application to find groups of connected users. In the query I check all possible groups for each user of the application (not all users are evaluated but could be a lot). For the moment I'm only making performance tests in localhost using Gremlin Server, since my application is not live yet.
The problem is that when testing this query simulating many users the query reaches the time limit a request can take that is configured in Gremlin Server by default, another problem is that the query does not take full CPU usage since it seems a single query is designed to use a single thread or a reduced amount of CPU processing in some way.
So I have 2 solutions in mind, divide the query in one chunk per user or use OLAP:
Solution 1: Send a query to get the users first and then send one query per user, then remove duplicates in the server code, this should work in my case and since I can send all the queries at the same time I can use all resources available and bypass the time limits.
Solution 2: Use OLAP. I guess OLAP does not have a time limit. The problem: My idea is to use Amazon Neptune and OLAP is not supported there as far as I know. In this question about it: Gremlin OLAP queries on AWS Neptune
David says: Update: Since GA (June 2018), Neptune supports multiple queries in a single request/transaction
What does it mean "multiple queries in a single request"?
How my solution 1 compares with OLAP?
Should I look for another database service that supports OLAP instead of Neptune? Which one could be? I don't want an option that implies learning to setup my own "Neptune like" server, I have limited time.
My query in case you want to take a look: https://gremlify.com/69cb606uzaj
This is a bit of a complicated question.
I'll assume there is a reason you can't change the default value, but for those who might be reading this answer the timeout is configurable both at the server (with
evaluationTimeout
in the server yaml) and per request both for scripts and bytecode based requests.If you're testing with TinkerGraph in Gremlin Server then know that TinkerGraph is really simple. It doesn't do anything internally to run any aspect of a traversal in parallel (without TinkerGraphComputer which is OLAP related).
Either approach has the potential to work. In the first solution you suggest a form of poor man's OLAP where you must devise your own methods for doing this parallel processing (i.e. manage thread pools, synchronize state, etc). I think that this approach is a common first step that folks take to deal with this sort of problem. I'd wonder if you need to be as fine grained as one user per request. I would think that sending several at a time would be acceptable but only testing in your actual environment would yield the answer to that. The nice thing about this solution is that it will typically work on any graph system, including Neptune.
Using your second solution with OLAP is trickier. You have the obvious problem that Neptune does not directly support it, but going to a different provider that does will not instantly solve your problem. While OLAP rids you of having to worry about how to optimally parallelize your workload, it doesn't mean that you can instantly take that Gremlin query you want to run, throw it into Spark and get an instant win. For example, and I take this from the TinkerPop Reference Documentation:
In your query, there are already a places where you "leave the star graph" so you would immediately find problems there to solve. Usually that limitation can be worked around for OLAP purposes but it's not as simple as adding
withComputer()
to your traversal and getting a win in this case.Going further down this path of using OLAP with a graph other than Neptune, you would probably want to at least consider if this complex traversal could be better written as a custom
VertexProgram
which might better bind your use case to the the capabilities of BSP than what the more genericTraversalVertexProgram
does when processing arbitrary Gremlin. For that matter, a mix of Gremlin OLAP, a customVertexProgram
and some standard map/reduce style processing might ultimately lead to the most elegant and efficient answer.An idea I've been considering for graphs that don't support OLAP has been to
subgraph()
(with Java) the portion of the graph that is relevant to your algorithm and then execute it locally in TinkerGraph! I think that might make sense in some use cases where the algorithm has some limits that can be defined ahead of time to form the subgraph, where those limits can be easily filtered and where the resulting subgraph is not so large that it takes an obscene amount of time to construct. It would be even better if the subgraph had some use beyond a single algorithm - almost behaving like a cache graph. I have no idea if that is useful to you but it's a thought. Here's a recent blog post I wrote that talks about writing VertexPrograms. Perhaps you will find it interesting.All that said about OLAP, I think that your first solution seems fine to start with. You don't have a multi-billion edge graph yet and can probably afford to take this approach for now.
I believe that this just means that you can send a script like:
where multiple Gremlin commands can be executed within the scope of a single transaction where each command must be "separated by newline ('\n'), spaces (' '), semicolon ('; '), or nothing (for example: g.addV(‘person’).next()g.V() is valid)". I think that only the last command returns a value. It doesn't seem like that particular feature would be helpful in your case. I would look more to batch users within a particular request where possible.