I am currently studying 2 phase and 3 phase commit.
The 3PC protocol tries to eliminates the 2PC protocol’s system blocking problem by adding an extra phase, preCommit. As mentioned here
According to this post, If the co-ordinator crashed at any point, a recovery node can take over the transaction and query the state from any remaining replicas. For example, if any remaining replicas replied to the recovery node it is in pre-commit state, then recovery node will know that the failed coordinator has sent pre-commit message, and all replicas has agreed to commit.
My question is, Why can't Two phase commit do the same thing? When the coordinator failed, the recovery node query those remaining nodes and see any of them already in Commit phase?
I have read server posts but still I don't what exact problem 3 phase commit is trying to solve and how it is solved?
Please help!
Recovery node can query remaining nodes but what happens if an another node crashes before recovery node gathers all messages ?
Two-phase commit protocol cannot proceed until each participant acknowledges each message.
Two phase waits and blocks resources for the following scenarios.
1- Coordinator fails after initiating prepare phase , new coordinator is elected.However if another node crashes before recovery node gathers all messages of phase 1, then the protocol can’t proceed. If all other participant nodes have agreed to commit but the newly crashed node might have intended to abort. So the recovery node can’t call the decision as a commit. This argument applies vice versa also.
2- Similarly, if a participant fails during phase 1 before the coordinator receives a response from the participant,same thing happens.Because coordinator doesn’t know the result of failed node and hence can’t proceed to commit or abort the consensus.
3- If participant or both coordinator and participant node fail during phase 2 coordinator can’t decide whether the transaction is committed.
Same scenarios with three phase commit.
1- In prepare phase, if a participant doesn’t hear from a coordinator in time,it aborts. Coordinator sends aborts to all if it doesn’t hear from any participant.(In fact same approach can be taken when using two phase commit but the two-phase commit protocol cannot proceed until each participant acknowledges each message.)
2- Same approach is taken with prepare phase.If coordinator times out waiting for a participant – assume it crashed, tells everyone to abort.
3-Coordinator doesn’t know whether the participant failed after committing or before committing. Hence coordinator can’t proactively decide whether the transaction is committed.So this step similar to this step is also similar to the two-phase commit protocol.
So now why there is additional step ? Other folks claim the following
"The aim of this is to 'remove the uncertainty period for participants that have committed and are waiting for the global abort or commit message from the coordinator."
I don't agree with that claim.It seems to me that just additional phase for giving chance to coordinator safely to roll back the entire operation if coordinator failure occurs.
Confusing articles have been written on this topic.Those are these are my conclusions and I am always open to review my answer.