Doesn't Paxos end up with the same instructions in the exact same order?

95 views Asked by At

I am trying to understand why ZAB is required when Paxos does the same thing. I read multiple documentations and articles and all summed up that Zookeeper requires the instructions to be in the exact same order. But with basic paxos, doesn't the leader pitch messages for the log with the exact indexes? So even if the messages are reordered, eventually the messages would go to the exact same log index and end up with the exact same log. So, doesn't it mean that in Paxos, all replicas get the same instructions in the exact same order?

So please answer my questions in the order like

  1. What exactly is message reordering and why Paxos cannot handle it?
  2. Paxos was not originally intended for atomic broadcasting? By Atomic broadcasting, they mean to achieve the total order. But by putting the messages in the same indexes in the replicas' logs, doesn't Paxos achieve the total order eventually?

References.

2

There are 2 answers

6
AndrewR On BEST ANSWER

May I refer you to https://en.wikipedia.org/wiki/Atomic_broadcast ?

Consensus and atomic broadcast are equivalent to each other (as described in the link). So either can be used.

Paxos, technically, is a family of protocols. When one gets to a practical implementation, lots of edge cases need to be covered. E.g. nodes coming online and offline in an efficient way, nodes replacement, log archiving, etc.

ZAB is protocol which supports all of those cases according to specs needed by designers.

As a summary: both paxos and zab (and others) provide correctness, but these protocol differ in efficiency.

Edit after links added to the question:

From one of links:

Atomic Broadcast Guarantee: Zab guarantees atomic broadcast, meaning that either all nodes receive a message in the same order or none of them receive it. This ensures consistency across the distributed system and prevents inconsistencies caused by message reordering or losses.

Those reorderings and losses are about messages being passed in unreliable networks. And atomic broadcast guarantees that even with unstable networking, all nodes get all messages in the same order.

Since atomic broadcast and consensus are the same thing, one can use either to reach this goal. So both ZAB and Paxos can be used.

Your questions:

What exactly is message reordering and why Paxos cannot handle it?

Paxos handles reordering (of messages on network layer) just fine.

Paxos was not originally intended for atomic broadcasting? By Atomic broadcasting, they mean to achieve the total order. But by putting the messages in the same indexes in the replicas' logs, doesn't Paxos achieve the total order eventually?

Paxos was presented in a research paper about consensus. If I remember correctly, the original work was able to run a consensus round once, agreing on a single value. Paxos extensions are able to handle stream of values, which makes it look like atomic broadcast (which it is due to consensus and atomic broadcast being the same thing).

If one wants to replace ZAB with Paxos, they totally can. Probably some efficiency will be lost, but the correctness is still there. Here is more info: https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos

0
Yeshwanth N On

What exactly is message reordering and why Paxos cannot handle it?

The correct operation of paxos depends on some assumptions about message ordering:

  1. Phase Ordering: Paxos operates in multiple phases (like prepare and accept phases). The protocol assumes that messages within a particular phase are processed in the correct order. For instance, acceptors must receive and respond to prepared requests before processing accepted requests.

  2. Proposal Ordering: Paxos ensures that proposals are processed in a certain order, determined by a monotonically increasing proposal number. Acceptors must respond to proposals based on the highest-numbered proposal they have seen.

When messages are reordered significantly, these assumptions can be violated.

For example:

If an "accept" message arrives before the corresponding "prepare" message, an acceptor might accept a proposal without the proper context, leading to inconsistencies. Reordering can cause older proposals to be processed after newer ones, which can confuse the protocol state and potentially lead to incorrect decisions or deadlocks. However, it's important to clarify that Paxos can handle a certain degree of message reordering. The protocol's design, with its use of proposal numbers and the requirement for acceptors to keep track of the highest-numbered proposal they have seen, provides resilience against some level of reordering. What Paxos may struggle with are extreme or pathological cases of message reordering, where the assumptions about phase and proposal ordering are severely violated. GPT An extreme case of message reordering in Paxos that can pose a challenge would be a scenario where messages from different rounds of the protocol are significantly delayed or arrive out of sequence, leading to confusion and potentially incorrect decisions.

Let's consider an example with two proposers (P1 and P2) and a set of acceptors (A1, A2, A3,...) illustrating how severe message reordering can disrupt the consensus process.

Example Scenario:

Round 1 - Proposer P1:

P1 sends a prepare request with proposal number 1 to all acceptors. Due to network delays, these messages are significantly delayed.

Round 2 - Proposer P2:

While P1's messages are delayed, P2 sends a prepare request with proposal number 2. Acceptors A1 and A2 receive P2's request first and respond to P2, acknowledging proposal number 2.

Delayed Messages from Round 1 Arrive:

Now, the delayed prepare requests from P1 (proposal number 1) finally arrive at the acceptors. Suppose A3 receives P1's prepare request before seeing any message from P2. A3 responds to P1, acknowledging proposal number 1.

P1 Sends Accept Requests:

P1, having received a response from A3 (and possibly from others if their responses to P2 were also delayed), sends accept requests with its value and proposal number 1.

Confusion Among Acceptors:

Acceptors are now receiving accept requests from P1 for an older proposal number (1) while they have already acknowledged a higher proposal number (2) from P2. Depending on the exact timing and order of message arrivals, some acceptors might erroneously accept P1's proposal, or there could be a split where some acceptors are aligned with P1 and others with P2.

Potential Inconsistency:

This situation can lead to confusion and inconsistency. If some acceptors have already promised to not accept proposals lower than 2 (from P2's round) but then receive and act on P1's delayed accept messages, the protocol's invariant is violated. Different acceptors might end up in different states, and if a quorum somehow accepts P1's value, it might conflict with what P2 is trying to propose.

Paxos was not originally intended for atomic broadcasting? By Atomic broadcasting, they mean to achieve the total order. But by putting the messages in the same indexes in the replicas' logs, doesn't Paxos achieve the total order eventually?

Zab uses a combination of epoch numbers and transaction IDs to order messages. Each leader election results in a new epoch, and each transaction proposed by the leader within an epoch has a unique ID. This combination helps ensure that messages are processed in the order they were intended, even if they arrive out of order.

Phases of Operation: Zab operates in three phases - Discovery, Synchronization, and Broadcast:

Discovery: Determines the most recent epoch and the last committed transaction. This phase helps in recovering from leader changes and ensuring all nodes start from a consistent state. Synchronization: The new leader proposes an initial set of transactions to synchronize the state of all followers. This helps in aligning the state across the cluster before new transactions are broadcast. Broadcast: The leader sends new transactions to the followers. Transactions are delivered in order, and each transaction contains a monotonically increasing ID.

Using the same setup with two proposers (P1 and P2) and a set of acceptors (A1, A2, A3):

Epoch 1 - Leader P1: P1 is the leader and proposes a set of transactions. Due to network delays, these transactions are significantly delayed.

Leader Failure and New Epoch: P1 is perceived as failed. P2 is elected as the new leader, starting Epoch 2. During the Discovery phase, P2 identifies the highest transaction ID committed in Epoch 1. During Synchronization, P2 ensures all followers have all committed transactions from Epoch 1.

Epoch 2 - Leader P2: P2 now starts broadcasting new transactions in Epoch 2. These transactions include unique IDs that follow the IDs from Epoch 1.

Delayed Transactions from Epoch 1: If delayed transactions from P1 (Epoch 1) arrive now, they are either already committed and present in the followers' logs (and thus are ignored) or are not committed and are superseded by P2's transactions in Epoch 2. Followers apply transactions based on the order of epochs and transaction IDs, ensuring consistency.

Consistent State: Due to the clear separation of epochs and the synchronization phase, all followers maintain a consistent state. Transactions from a previous leader (P1) that arrive late do not disrupt the current state under the new leader (P2). Transactions in Zab are applied in a strict sequence, as determined by their epoch and transaction ID.