NullPointerException on query in Neo4j 3.0.4 enterprise

200 views Asked by At

I am running Neo4j 3.0.4 Enterprise in HA and am encountering an extremely mysterious issue. On doing a very basic Cypher query from our Spring Data Neo4j application (following is taken from the Neo4j DB query.log):

MATCH (n:`Node`) 
WHERE n.`key` = { `key` } 
WITH n 
MATCH p=(n)-[*0..1]-(m) 
RETURN p, ID(n) - {key: 123456}

I get an exception related to the page cache:

java.lang.NullPointerException
    at org.neo4j.io.pagecache.impl.muninn.MuninnPageCursor.assertPagedFileStillMappedAndGetIdOfLastPage(MuninnPageCursor.java:369)
    at org.neo4j.io.pagecache.impl.muninn.MuninnReadPageCursor.next(MuninnReadPageCursor.java:55)
    at org.neo4j.io.pagecache.impl.muninn.MuninnPageCursor.next(MuninnPageCursor.java:121)
    at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:1039)
    at org.neo4j.kernel.impl.store.CommonAbstractStore.access$000(CommonAbstractStore.java:64)
    at org.neo4j.kernel.impl.store.CommonAbstractStore$1.next(CommonAbstractStore.java:1179)
    at org.neo4j.kernel.impl.api.store.StoreSingleNodeCursor.next(StoreSingleNodeCursor.java:64)
    at org.neo4j.kernel.impl.api.StateHandlingStatementOperations.nodeCursorById(StateHandlingStatementOperations.java:137)
    at org.neo4j.kernel.impl.api.ConstraintEnforcingEntityOperations.nodeCursorById(ConstraintEnforcingEntityOperations.java:422)
    at org.neo4j.kernel.impl.api.OperationsFacade.nodeGetProperty(OperationsFacade.java:333)
    at org.neo4j.cypher.internal.spi.v3_0.TransactionBoundQueryContext$NodeOperations.getProperty(TransactionBoundQueryContext.scala:316)

And then from here on out, the same query will fail intermittently and will be logged as a one liner:

java.lang.NullPointerException

On the app server side, we get the following generic error code:

org.neo4j.ogm.exception.CypherException: Error executing Cypher "Neo.DatabaseError.Statement.ExecutionFailed"
  • Has any Neo4j experts seen this issue before?
  • Moving the master in my cluster (ie restarting) seems to fix it but this resurfaces if any significant amount of load is placed on the server. The page cache should be fairly large as I have an 8 GB database and have left not only the heap size as default but the page cache size unset too so that it takes on its default value.
  • The logs indicate that there are a large number of concurrent queries right before this exception (all happening within the same second and quite a few querying for the exact same thing). Could there be some type of race condition in how the page cache works? What is a realistic limit on concurrent reads?

Any advice at all is greatly appreciated!

0

There are 0 answers