The protocol overview from Google states that:

Protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text.

I'm not sure I understand, could someone give me a proper example/explanation on this?

1 Answers

1
Kenton Varda On Best Solutions

Imagine the following piece of HTML:

<p>Hello! <strong>This text is strong (bold).</strong> <em>This is
emphasized (italic).</em> And <a href="https://example.com">here is
a link</a>.</p>

If you created a reasonable Protobuf representation of rich text and then wrote this out in Protobuf text format, it might look something like:

{
  p: {
    children: { text: "Hello! " }
    children: {
      strong: {
        children: { text: "This text is strong (bold)." }
      }
    }
    children: { text: " " }
    children: {
      em: {
        children: { text: "This is emphasized (italic)." }
      }
    }
    children: { text: " And " }
    children: {
      a: {
        href: "https://example.com"
        children: { text: "here is a link" }
      }
    }
    children: { text: "." }
  }
}

As you can see, the Protobuf representation appears very complex. The underlying text is no longer readable, as the structure dominates.

Now, in terms of the actual data structure, the Protobuf representation actually isn't very different from what an HTML/XML parser would build. In code, it might be no more difficult to work with. And the binary serialization of the Protobuf might be reasonable. You might even save a few bytes compared to the XML representation (though probably not a lot, since most of the space will still be taken by the underlying text).

If you were writing a WYSIWYG rich text editor where the users never see the underlying representation, then using Protobuf to represent the text like above could make a lot of sense.

What the text you quote (which I wrote, BTW!) is trying to say is that if you have a use case where a human is authoring text with markup, but has to do so in a plain text editor, then Protobuf is not a good solution. HTML or XML work much better for text markup.

OTOH, if you have a human authoring highly structured data in a plain-text format, then Protobuf text format might work rather well! For example, many people write config files this way -- and lots and lots of people use JSON for this, which works out to be pretty similar. Meanwhile, XML turns out to be very cumbersome and painful for these use cases.

With all that said, when humans are entering the data, it probably makes sense to use a really human-optimized format. For text with markup, use Markdown. For structured data like config files, YAML is pretty good. But note that Markdown and YAML do not work well as an interchange format between two computers.