Narayana/2PC/XA - Unlock resources after prepare message propagation failure

296 views Asked by At

Consider this scenario.

  1. Coordinator sends prepare messages to 2 participants, and crashes
  2. Participants lock resources successfully, and then wait for the coordinator to recover
  3. Coordinator recovers, but did not receive messages from participants about prepare_success messages

Is manual intervention required to unlock the locked resources? Or do the participants poll the coordinator to find the status of the transaction?

At the outset, this sounds similar to the case when a participant does not receive a commit message, but the main difference is that the coordinator redrives the messages in that scenario. In the scenario listed above, the coordinator does not even know that it has to redrive a global transaction because no record of it was made in it's log.

1

There are 1 answers

0
chalda On BEST ANSWER

I can put here some details on how Narayana works. The strategy on XA recovery could be various depending on the transaction manager implementation.

The scenario which you talks about is managed in Narayana by a procedure named orphan detection. As you pointed down the Narayana transaction manager crashes before the prepare phase was ended thus there is no information about transaction existence in the Narayana log. Here the requirement is that Narayana configuration has to know all possible participants. In case of WildFly it's ensured by definition of datasources or resource managers in standalone.xml. The recovery process asks all the available resources with XAResource.recover call ( https://docs.oracle.com/javase/7/docs/api/javax/transaction/xa/XAResource.html#recover(int) ). Resource returns Xid of all in-doubt transactions it is aware of.

The Xid was constructed by Narayana (passed to resource during prepare and saved in resource txn log, during recovery returned back to Narayana) and contains transaction manager id (https://wildscribe.github.io/WildFly/11.0.CR1/subsystem/transactions/index.html -> node-identifier). Narayana checks if the Xid belongs to the current Narayana instance (node identifiers matches). If so and there is no notion about the Xid in the Narayana transaction log, based on 2PC presumed abort optimization (https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/7.0/html/development_guide/java_transaction_api_jta#about_the_presumed_abort_optimization), it finally asks the resource to rollback. That effectively remove locks.