Consider this scenario.
- Coordinator sends prepare messages to 2 participants, and crashes
- Participants lock resources successfully, and then wait for the coordinator to recover
- Coordinator recovers, but did not receive messages from participants about prepare_success messages
Is manual intervention required to unlock the locked resources? Or do the participants poll the coordinator to find the status of the transaction?
At the outset, this sounds similar to the case when a participant does not receive a commit message, but the main difference is that the coordinator redrives the messages in that scenario. In the scenario listed above, the coordinator does not even know that it has to redrive a global transaction because no record of it was made in it's log.
I can put here some details on how Narayana works. The strategy on XA recovery could be various depending on the transaction manager implementation.
The scenario which you talks about is managed in Narayana by a procedure named
orphan detection
. As you pointed down the Narayana transaction manager crashes before the prepare phase was ended thus there is no information about transaction existence in the Narayana log. Here the requirement is that Narayana configuration has to know all possible participants. In case of WildFly it's ensured by definition of datasources or resource managers instandalone.xml
. The recovery process asks all the available resources withXAResource.recover
call ( https://docs.oracle.com/javase/7/docs/api/javax/transaction/xa/XAResource.html#recover(int) ). Resource returnsXid
of all in-doubt transactions it is aware of.The
Xid
was constructed by Narayana (passed to resource during prepare and saved in resource txn log, during recovery returned back to Narayana) and contains transaction manager id (https://wildscribe.github.io/WildFly/11.0.CR1/subsystem/transactions/index.html ->node-identifier
). Narayana checks if theXid
belongs to the current Narayana instance (node identifiers matches). If so and there is no notion about theXid
in the Narayana transaction log, based on 2PC presumed abort optimization (https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/7.0/html/development_guide/java_transaction_api_jta#about_the_presumed_abort_optimization), it finally asks the resource to rollback. That effectively remove locks.