Neptune ConcurrentModificationException when inserting same type of vertices in different transaction

184 views Asked by At

We are working with the AWS Neptune Database and currently we are facing issues when two users save something to the database at the same time.

In this picture you can see our model in a simplified way. We have our root vertices with label "A" which then can have several "B" vertices which again can have several "C" vertices and then a bunch of connected vertices see graph model here

What we do is that we start a transaction, add C, D, E, F, G vertices and edges, commit transaction.

The problem occurs when now two users hit "save" at the same time for completely independent "A" vertices. User1 wants to save a new subgraph for vertex A (ID: 4711) and User2 wants to save a new subgraph for vertex A (ID: 4712). This subgraphs are completely independent and have no connection between them. My first assumption would be that adding a subgraph under vertex A (ID: 4711) has no impact on adding a subgraph under different vertex A (ID: 4712). The add operation can take a few seconds. And now if two transactions trying to add a subgraph at the same time we get this exception:

Caused by: java.util.concurrent.CompletionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: {"detailedMessage":"Failed to complete Insert operation for a Vertex due to conflicting concurrent operations. Please retry. 0 transactions are currently rolling back.","requestId":"64be5fdb-4208-4729-bc68-dccfecbfc87f","code":"ConcurrentModificationException"}

Caused by: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: {"detailedMessage":"Failed to complete Insert operation for a Vertex due to conflicting concurrent operations. Please retry. 0 transactions are currently rolling back.","requestId":"64be5fdb-4208-4729-bc68-dccfecbfc87f","code":"ConcurrentModificationException"}

I stopped in the debugger within the transaction and tried to add vertices over the Jupyter Notebook. I was not able to add a vertex "A" while the debugger was stopped. But it was no problem to insert a new vertex "XYZ" which is not used in the graph before.

I assume that adding vertices blocks the index for other transaction. That is would I found in the documentation:

I would say that we use SERIALIZABLE as isolation level: https://docs.aws.amazon.com/neptune/latest/userguide/transactions-isolation-levels.html: READ UNCOMMITTED – Allows all three kinds of interaction (that is, dirty reads, non-repeatable reads, and phantom reads). READ COMMITTED – Dirty reads are not possible, but nonrepeatable and phantom reads are. REPEATABLE READ – Neither dirty reads nor nonrepeatable reads are possible, but phantom reads still are. **SERIALIZABLE **– None of the three types of interaction phenomena can occur.

https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-transactions.html: Sessionless read-only queries are executed under SNAPSHOT isolation, but read-only queries run within an explicit transaction are executed under SERIALIZABLE isolation. The read-only queries executed under SERIALIZABLE isolation incur higher overhead and can block or get blocked by concurrent writes, unlike those run under SNAPSHOT isolation.

The queries look like this:

g.V("f6e9a3ed-5b2e-400a-83f4-074af0fee71f").fold().coalesce(unfold(),addV("C").property(single,"status",'"abc"').property(T.id,"f6e9a3ed-5b2e-400a-83f4-074af0fee71f")).id().fold()

I just thought that the coalesce might be the problem as in this post: ConcurrentModificationException in amazon neptune using gremlin javascript language variant but I tried to stop my debugger in a transaction and just executed g.addV("C") and had the same issue.

Is it possible to add subgraphs for different graphs at the same time?

After some deeper investigation I found that the first query in our transaction blocks the whole database. Do you know why this query is blocking everything?

g.V().hasLabel("A").has("v",4711).outE("s").inV()
.hasLabel("B").hasId("c4...a","a0...f","ac...3")
.outE("h").inV()
.bothE("sv","ev")
.bothV().not(hasLabel("C")).simplePath()
.barrier()
.repeat(outE().not(hasLabel("sv","ev")).simplePath().inV())
.until(or(outE().count().is(0),hasLabel("C"),hasLabel("AP","AC").bothE("sv","ev").count().is(P.gt(0))))
.path().unfold().dedup().or(hasLabel("sp").has("ev"),hasLabel("ev")).barrier()
.drop()
2

There are 2 answers

5
Taylor Riggan On

I believe what you are seeing is the effect of gap locks [1]. If you are using session-based transactions, as soon as you go to write a given vertex, edge, or property, the gap in the associated index where those values are being stored will be locked. So anything else attempting to write into the same gap would encounter a deadlock situation. In Neptune, deadlocks are treated as "winner first", so any subsequent requests to write in a deadlock situation gets thrown a ConcurrentModificationException.

The other item of note is that the values written into the index are not the values that you are writing in your query. Neptune uses a dictionary and stores the dictionary IDs for a given value in the indexes. That dictionary value is non-deterministic and not something that you can predict ahead of time. Which makes trying to avoid CMEs a bit difficult.

The current guidance with CMEs is to implement a retry and exponential backoff strategy [2] into your application. In most cases, an immediate retry will succeed. If you have long running transactions, you may want to adjust the wait/retry period or have a steeper exponential curve for retry times on each attempt.

[1] https://docs.aws.amazon.com/neptune/latest/userguide/transactions-neptune.html#transactions-neptune-false-conflicts

[2] https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/retry-backoff.html

0
WobiHH On

In the end we did these things to make it work:

  • Use read queries without a transaction, so we read with SNAPSHOT Isolation
  • Change drop queries from fetch-and-drop to explicitly drop edges and vertices by ID in one transaction and before we have read the needed subgraph without transaction.

This is currently working and we need to ensure that the data stays consistent because of not reading with transaction.