In our project we are testing how transactions work in distributed environment. As a part of the project we are testing opensource edition of GridGain 6.5.5.
We have faced lots of problems in the following testcase:
- We are testing a cache without any additional rules.
- The cache stores an id-String as a key and BigDecimal as a value.
- We are testing base operations (addition and subtraction) on values of the first cache from 6, 12 and 18 clients. One operation looks like "subtract X from A, add X to B".
- GridGain application is deployed as a .war file in WildFly.
- Clients are connecting to WildFly with deployed GridGain using HTTP and send a list of operations to do (we are testing batches with 1 operation, 50, 500, 1000, 5000 operations).
- We are testing clustered multinode mode with transactions, configuration files that we have used are attached further.
- We have tested both pessimistic and optimistic transactions separately.
- We call result values "consistent" if they are equal to the dummy-mode: one client, batch=1, one node. We have a dummy program for cross-check (its results in this mode is always equal to GridGain in local mode).
The issues are:
- If we are doing transaction as-is (subtract from one keys value, add to another) we face two problems: deadlocks and inconsistency if we get no deadlocks. The number of inconsistent values is small but we can't avoid it -- it's about 12 per 1000 key-values.
- If we transform our requests to be sorted by key in each client (so the order of operation may change) we can avoid deadlocks and inconsistency. But we get another issues: if the batch is at least 500, we have non-ending transaction failures. If the batch is small, we have GridGain failing completely (it doesn't respond to the current query).
- Everything is working very slow and we have almost no CPU load at the same time (About 6 seconds for batch=1000 operations). Is it ok?
Our hardware:
8x Dell M620 blades, 256GB RAM, 2x8 core Xeon E2650v2, 10GbE network.
Attaches:
- GridGain optimistic config: https://gist.github.com/al-indigo/a2824aa62a3af8b18932
- GridGain pessimistic config: the same but with
- GridGain log for second issue: https://gist.github.com/al-indigo/233058772418fba8d341
(Moving from the comment)
In order to avoid deadlocks you need to make sure that you acquire locks in the same order. This must be done when working with transactions in any system of records, be that Oracle database or GridGain data grid.
As for the performance, it should be very fast. Most likely it is a matter of configuration. Can I ask you to provide a reproducible example? (you can use pastbin.com to share your code)