How do Narayana/XA recover from TM failures?

539 views Asked by At

I was trying to reason about failure recovery actions that can be taken by systems/frameworks which guarantee synchronous data sources. I've been unable to find a clear explanation of Narayana's recovery mechanism.

Q1: Does Narayana essentially employ a 2-phase commit to ensure distributed transactions across 2 datasources?

Q2: Can someone explain Narayana's behavior in this scenario?

  1. Application wants to save X to 2 data stores
  2. Narayana's transaction manager (TM) generates a transaction ID and writes info to disk
  3. TM now sends a prepare message to both data stores
  4. Each data store responds back with prepare_success
  5. TM updates local transaction log and sends a commit message to both data stores
  6. TM fails (permanently). And because of packet loss on the network, only one data store receives the commit message. But the other data stores receives and successfully processes the commit message.

The two data stores are now out of sync with each other (one source has an additional transaction that is not present in the other source).

When a new TM is brought up, it does not have access to the old transaction state records. So the TM cannot initiate the recovery of the missing transaction in one of the data stores.

So how can 2PC/Narayana/XA claim that they guarantee distributed transactions that can maintain 2 data stores in sync? From where I stand, they can only maintain synchronous data stores with a very high probability, but they cannot guarantee it.

Q3: Another scenario where I'm unclear on the behavior of the application/framework. Consider the following interleaved transactions (both on the same record - or at least with a partially overlapping set of records):

  • Di = Data source i
  • Ti = Transaction i
  • Pi = prepare message for transaction i

D1 receives P1; responds P1_success

D2 receives P2; responds P2_success

D1 receives P2; responds P2_failure

D2 receives P1; responds P1_failure

The order in which the network packets arrive at the different data sources can determine which prepare request succeeds. Does this not mean that at high transaction speeds for a contentious record - it is possible that all transactions will keep failing (until the record experiences a lower transaction request rate)?

One could argue that we are choosing consistency over availability but unlike ACID systems there is no guarantee that at least one of the transactions will succeed (thus avoiding a potentially long-lasting deadlock).

1

There are 1 answers

4
chalda On BEST ANSWER

I would refer you to my article on how Narayana 2PC works https://developer.jboss.org/wiki/TwoPhaseCommit2PC

To your questions

Q1: you already mentioned that in the comment - yes, Narayana uses 2PC = Narayana implements the XA specification (pubs.opengroup.org/onlinepubs/009680699/toc.pdf).

Q2: The steps in the scenario are not precise. Narayana writes to disk at time of prepare is called, not at time the transaction is started.

  1. Application saves X to 2 data stores
  2. TM now sends a prepare message to both data stores
  3. Each data store responds back with prepare_success
  4. TM saves permanently info about the prepared transaction and its ID to transaction log store
  5. TM sends a commit message to both data stores
  6. ...

I don't agree that 2PC claims to guarantee to maintain 2 data stores in sync. I was wondering about this too (e.g. asked here https://developer.jboss.org/message/954043). 2PC claims guaranteeing ACID properties. Having 2 stores in sync is kind of what CAP consistency is about.

In this Narayana strictly depends on capabilities of particular resource managers (data stores or jdbc drivers of data stores). ACID declares

  • atomicity - whole transaction is committed or rolled-back (no info when it happens, no info about resources in sync)
  • consistency - before and when the transaction ends the system is in consistent state
  • durability - all is stored even when a crash occurs
  • isolation - (tricky one, left at the end) - for being ACID we have to be serializable. That's you can observe transactions happening "one by one". If I take a pretty simplified example, to show my point - expecting DB being implemented in a naive way of locking whole database when transaction starts - you committed jms message, that's processed and now you don't commit the db record. When DB works in the serializable isolation level (that's what ACID requires!) then your next write/read operation has to wait until the 'in-flight prepared' transaction is resolved. DB is just stuck and waiting. If you read you won't get answer so you can't say what is the value. The Narayana's recovery manager then come to that prepared transaction after connection is established and commit it. And you read action returns information that is 'correct'.

Q3: I don't understand the question, sorry. But if you state that The order in which the network packets arrive at the different data sources can determine which prepare request succeeds. then you are right, you are doomed to get failing transaction till network become more stable.