I have a very large graph with hundreds of millions of nodes and relationships, where I need to make a traversal to find if a specific node is connected with another one containing a particular property.
The data is highly interconnected, and for a pair of nodes there can be multiple relationships linking them.
Given that this operation needs to run on a real time system I have very strict time constraints, requiring no more than 200ms to find possible results.
So I have created the following TraversalDescriptor:
TraversalDescription td = graph.traversalDescription()
.depthFirst()
.uniqueness(Uniqueness.NODE_GLOBAL)
.expand(new SpecificRelsPathExpander(requiredEdgeProperty)
.evaluator(new IncludePathWithTargetPropertyEvaluator(targetNodeProperty));
The Evaluator checks for each path if the end node is my target, including and pruning the path if that's the case or excluding it and continuing if it's not. Also, I set a limit on the time spent traversing and the maximum number of results to find. Everything can be seen in the code below:
private class IncludePathWithTargetPropertyEvaluator implements Evaluator {
private String targetProperty;
private int results;
private long startTime, curTime, elapsed;
public IncludePathWithTargetPropertyEvaluator(String targetProperty) {
this.targetProperty = targetProperty;
this.startTime = System.currentTimeMillis();
this.results = 0;
}
public Evaluation evaluate(Path path) {
curTime = System.currentTimeMillis();
elapsed = curTime - startTime;
if (elapsed >= 200) {
return Evaluation.EXCLUDE_AND_PRUNE;
}
if (results >= 3) {
return Evaluation.EXCLUDE_AND_PRUNE;
}
String property = (String) path.endNode().getProperty("propertyName");
if (property.equals(targetProperty)) {
results = results + 1;
return Evaluation.INCLUDE_AND_PRUNE;
}
return Evaluation.EXCLUDE_AND_CONTINUE;
}
Finally I written a custom PathExpander because each time we need to traverse only edges with a specific property value:
private class SpecificRelsPathExpander implements PathExpander {
private String requiredProperty;
public SpecificRelsPathExpander(String requiredProperty) {
this.requiredProperty = requiredProperty;
}
public Iterable<Relationship> expand(Path path, BranchState<Object> state) {
Iterable<Relationship> rels = path.endNode().getRelationships(RelTypes.FOO, Direction.BOTH);
if (!rels.iterator().hasNext())
return null;
List<Relationship> validRels = new LinkedList<Relationship>();
for (Relationship rel : rels) {
String property = (String) rel.getProperty("propertyName");
if (property.equals(requiredProperty)) {
validRels.add(rel);
}
}
return validRels;
}
// not used
public PathExpander<Object> reverse() {
return null;
}
The issue is that the traverser keep going even long after the 200ms have passed.
From what I understood the evaluator behavior is to enqueue all following branches for each path evaluated with EXCLUDE_AND_CONTINUE, and the traverser itself won’t stop until it has visited all subsequent paths in the queue.
So what can happen is: if I have even few nodes with a very high degree it will result in thousands of paths to be traversed.
In that case, is there a way to make the traverser abruptly stop when the timeout has been reached and return possible valid paths that have been found in the while?
I would go with the following line of thought:
I think taking a look at Neo4J TraversalDescription Definition might be beneficial for you too.