Why does Google Docs operational transformation err on the side of deletion?

1.5k views Asked by At

Tried out this experiment today: opened two offline editors for a Google document. In one, I bolded the first word. In the second, I deleted it. Regardless of which client I turn on first, the word always ends up deleted.

First off, why is this the case - my understanding of operational transformation is that ordering matters? In the simple example of two people typing "a" and "b" respectively, if the server receives "a" first, it will enforce the output of "ab" by transforming the second person's "b" event into a "pass one space, then add b" event, and vice versa.

Secondly, if ordering doesn't matter, are there technical reasons as to why Google Docs has chosen to err on the side of deletion? Or are the reasons largely simplicity for users?

3

There are 3 answers

3
ehfeng On

It's not a question of erring on the side of deletion.

In cases where both clients have equality valid but differing versions of truth, Google Docs must elect to uphold one version, or else force users to resolve conflicts, something that is inherently complicated and hard to explain.

Thus, "truth" for Google Docs is consistency of the document, not discernment of intent. And consistency is best more easily achieved through destruction of information - a sort of tendency to entropy.

All this is basically my semi-philosophical BS though...

0
osma On

OT does not try to discern intent, it applies transformations in an order which produces a consistent result. When you apply both of those changes to a document, it does not matter which order you apply them in.

"first second" -> "first second" -> "first"

"first second" -> "first" -> "first"

In the second stream, the bold operation is performed on a zero-length string.

This is the exact same result you would get if in one of those documents you had italicized the second word: the end result would be "first second" regardless of transformation order. Delete transformation is no different.

1
adelriosantiago On

Here is (5 years later I know) a graphical explanation of what why this happens. This is, in fact, what @osma describes but graphically explained:

When you bold a string in GDocs you are wrapping the string into a container, presumably <strong></strong> but they may use any other syntax. For simplicity lets just say that bold'ing a string just requires a "+" at the beginning of the word. So that, for simplicity, the text "lorem ipsum" would become lorem +ipsum and not lorem <strong>ipsum<strong>

1

Both Alice and Bob start with the text "Lorem ipsum" enter image description here

2

Bob then deletes "ipsum". Notice that he sends the changeset retain(6), delete(5) to the server. A changeset is essentially a patch, Google probably used this library. enter image description here

3

Now Alice bolds "ipsum" (adding "+"). She sends is the changeset retain(6), insert(+), retain(5) enter image description here

4

Both changesets are traveling to the server. The server knows nothing about these sets yet. enter image description here

5

Assuming the worst scenario: Bob's package arrives first and then the word will be deleted. The other scenario is obvious. enter image description here

6

When Alice's package arrives, it will only add a "+" to the text because what she sent is only a single changeset. enter image description here

7

Both texts are then broadcasted to the clients. This is the first one. enter image description here

8

And this is the second one. enter image description here

9

After patching these changesets into the original text you end up with "Lorem +". The server and all clients now have the same text. The + symbol would later be erased by an common HTML clean process which eliminates empty tags like <tag></tag>,

enter image description here

To test this demo go to: http://operational-transformation.github.io/visualization.html. There you can play with the texts and packages as they are sent/received.