several questions about multi-paxos?

441 views Asked by At

I have several questions about multi-paxos

  1. will each instance has it's own proposal Number and accepted ballot and accepted value ? or all the instance share with the same proposal number ,after one is finished ,then anther one start?

  2. if all the instance share with the same proposal number ,Consider the below condition, server A sends a proposal ,and the acceptor returns the accepted instanceId which might be greater or less than the proposal'instanceid ,then what will proposal do? use that instanceId and it's value for accept phase? then increase it'own instanceId ,waiting for next round ,then re-proposal with it own value? if so , when is the previous accepted value removed,because if it's not removed ,the acceptor will return this intanceId and value again,then it seems it is a loop

1

There are 1 answers

0
rystsov On

Multi-Paxos has a vague description so two persons may build two different systems based on it and in a context of one system the answer is "no," and in the context of another it's "yes."

Approach #1 - "No"

Mindset: Paxos is a two-phase protocol for building write-once registers. Multi-Paxos is a technique how to create a log on top of them.

One of the possible ways to build a log is

  1. Create an array of completely independent write-once registers and initialize the first one with an initial value.
  2. On new record we should:

    A) Guess an index (X) of a vacant register and try to write a dummy record here (if it's already used then pick a register with a higher index and retry).

    B) Start writing dummy records to every register with smaller than X index until we find a register filled with a non-dummy record.

    C) Calculate a new record based on it (e.g., a record may have an ordinal, and we can use it to calculate an ordinal of the new record; since some registers are filled with dummy records the ordinals aren't equal to index) and write it to the X+1 register. In case of a conflict, we should restart the procedure from step A).

  3. To read the log we should start writing dummy values from the first record, and on each conflict, we should increment index and retry until the write is succeeded which would indicate that the log's end is reached.

Of course, there is a lot of overhead in this approach, so please treat it just like a top-level overview what Multi-Paxos is.

The log is a powerful concept, and we can use it as a recipe for building distributed state machines - just think of each record as an update command. Unfortunately, in some cases, there is also a lot of overhead. For example, if you want to build a key/value storage and you care only about the current value than you don't need history and probably need to implement garbage collection to remove past versions from the log to optimize storage costs.

Approach #2 - "Yes"

Mindset: rewritable register as a heavily optimized version of Multi-Paxos.

If you start with the described approach with an application to the creation of key/value storage and then iterate in other to get rid of overhead, e.g., by doing garbage collection on the fly then eventually you may come up with an idea how to update the write-once register to be rewritable.

In that case, each instance uses the same ballot numbers just because all the instances are collapsed into one rewritable instance.

I described this approach in the How Paxos Works post and implemented it in the Gryadka project with 500-lines of JavaScript. Also, the idea behind it was independently checked with TLA+ by Greg Rogers and Tobias Schottdorf.