Timeout on Neo4j traversal framework

162 views Asked by At

I have a very large graph with hundreds of millions of nodes and relationships, where I need to make a traversal to find if a specific node is connected with another one containing a particular property.
The data is highly interconnected, and for a pair of nodes there can be multiple relationships linking them.

Given that this operation needs to run on a real time system I have very strict time constraints, requiring no more than 200ms to find possible results.

So I have created the following TraversalDescriptor:

TraversalDescription td = graph.traversalDescription()
                           .depthFirst()
                           .uniqueness(Uniqueness.NODE_GLOBAL)
                           .expand(new SpecificRelsPathExpander(requiredEdgeProperty)
                           .evaluator(new IncludePathWithTargetPropertyEvaluator(targetNodeProperty));

The Evaluator checks for each path if the end node is my target, including and pruning the path if that's the case or excluding it and continuing if it's not. Also, I set a limit on the time spent traversing and the maximum number of results to find. Everything can be seen in the code below:

private class IncludePathWithTargetPropertyEvaluator implements Evaluator {

private String targetProperty;
private int results;
private long startTime, curTime, elapsed;       

public IncludePathWithTargetPropertyEvaluator(String targetProperty) {
    this.targetProperty = targetProperty;
    this.startTime = System.currentTimeMillis();
    this.results = 0;
}

public Evaluation evaluate(Path path) {

    curTime = System.currentTimeMillis();
    elapsed = curTime - startTime;

    if (elapsed >= 200) {
        return Evaluation.EXCLUDE_AND_PRUNE;
    }

    if (results >= 3) {
        return Evaluation.EXCLUDE_AND_PRUNE;
    }

    String property = (String) path.endNode().getProperty("propertyName");

    if (property.equals(targetProperty)) {
        results = results + 1;
        return Evaluation.INCLUDE_AND_PRUNE;
    }

    return Evaluation.EXCLUDE_AND_CONTINUE;
}

Finally I written a custom PathExpander because each time we need to traverse only edges with a specific property value:

private class SpecificRelsPathExpander implements PathExpander {

private String requiredProperty; public SpecificRelsPathExpander(String requiredProperty) { this.requiredProperty = requiredProperty; } public Iterable<Relationship> expand(Path path, BranchState<Object> state) { Iterable<Relationship> rels = path.endNode().getRelationships(RelTypes.FOO, Direction.BOTH); if (!rels.iterator().hasNext()) return null; List<Relationship> validRels = new LinkedList<Relationship>(); for (Relationship rel : rels) { String property = (String) rel.getProperty("propertyName"); if (property.equals(requiredProperty)) { validRels.add(rel); } } return validRels; } // not used public PathExpander<Object> reverse() { return null; }

The issue is that the traverser keep going even long after the 200ms have passed.

From what I understood the evaluator behavior is to enqueue all following branches for each path evaluated with EXCLUDE_AND_CONTINUE, and the traverser itself won’t stop until it has visited all subsequent paths in the queue.
So what can happen is: if I have even few nodes with a very high degree it will result in thousands of paths to be traversed.

In that case, is there a way to make the traverser abruptly stop when the timeout has been reached and return possible valid paths that have been found in the while?

2

There are 2 answers

1
0x90 On

I would go with the following line of thought:

Once the timeout was elapsed stop expanding the graph.

private class SpecificRelsPathExpander implements PathExpander {

private String requiredProperty;
private long startTime, curTime, elapsed;

public SpecificRelsPathExpander(String requiredProperty) {
    this.requiredProperty = requiredProperty;
    this.startTime = System.currentTimeMillis();
}

public Iterable<Relationship> expand(Path path, BranchState<Object> state) {
    curTime = System.currentTimeMillis();
    elapsed = curTime - startTime;

    if (elapsed >= 200) {
        return null;
    }

    Iterable<Relationship> rels = path.endNode().getRelationships(RelTypes.FOO, Direction.BOTH);
    if (!rels.iterator().hasNext())
        return null;
    List<Relationship> validRels = new LinkedList<Relationship>();
    for (Relationship rel : rels) {
        String property = (String) rel.getProperty("propertyName");
        if (property.equals(requiredProperty)) {
            validRels.add(rel);
        }
    }
    return validRels;
}

// not used
public PathExpander<Object> reverse() {
    return null;
}

I think taking a look at Neo4J TraversalDescription Definition might be beneficial for you too.

0
Mattias Finné On

I would implement the expander to keep the lazy nature of the traversal framework, also for it's simpler code. This would prevent the traversal eagerly collecting all relationships for a node, Like this:

public class SpecificRelsPathExpander implements PathExpander, Predicate<Relationship>
{
    private final String requiredProperty;

    public SpecificRelsPathExpander( String requiredProperty )
    {
        this.requiredProperty = requiredProperty;
    }

    @Override
    public Iterable<Relationship> expand( Path path, BranchState state )
    {
        Iterable<Relationship> rels = path.endNode().getRelationships( RelTypes.FOO, Direction.BOTH );
        return Iterables.filter( this, rels );
    }

    @Override
    public boolean accept( Relationship relationship )
    {
        return requiredProperty.equals( relationship.getProperty( "propertyName", null ) );
    }

    // not used
    @Override
    public PathExpander<Object> reverse()
    {
        return null;
    }
}

Also the traversal will continue as long as the client, i.e. the one holding the Iterator received from starting the traversal calls hasNext/next. There will be no traversal on its own, it all happens in hasNext/next.