Trouble saving repeated protobuf object to file (Python)

4.2k views Asked by At

I'm new to protobuf, so I don't know how to frame the question correctly.

Anyways, I'm using this Model Config proto file. I converted it into python using this command protoc -I=. --python_out=. ./model_server_config.proto from Protocol Buffer page. Now I have some python files which I can import and work on. My objective is to create a file (for running the TensorFlow model server with multiple models) which should look like the following:

model_config_list: {
 config: {
    name: "name1",
    base_path: "path1",
    model_platform: "tensorflow"
  },
  config: {
    name: "name2",
    base_path: "path2",
    model_platform: "tensorflow"
  },
  config: {
    name: "name3",
    base_path: "path3",
    model_platform: "tensorflow"
  },
}

Now using the python package compiled, I made a protobuf object which looks like this when I print it out:

model_config_list {
  config {
    name: "name1"
    base_path: "path1"
    model_platform: "tensorflow"
  }
  config {
    name: "name2"
    base_path: "path2"
    model_platform: "tensorflow"
  }
  config {
    name: "name3"
    base_path: "path3"
    model_platform: "tensorflow"
  }
}

But while serializing the object using objectname.SerializeToString(), I get a weird output as :

b'\n\x94\x01\n \n\x04name1\x12\x0cpath1"\ntensorflow\n7\n\x08name2\x12\x1fpath2"\ntensorflow\n7\n\x08name3\x12\x1fpath3"\ntensorflow'

I tried converting it into Json also using the protobuf for python like this:

from google.protobuf.json_format import MessageToJson
MessageToJson(objectname)

which gave me a result like:

{
  "modelConfigList": {
    "config": [
      {
        "name": "name1",
        "basePath": "path1",
        "modelPlatform": "tensorflow"
      },
      {
        "name": "name2",
        "basePath": "path2",
        "modelPlatform": "tensorflow"
      },
      {
        "name": "name3",
        "basePath": "path3",
        "modelPlatform": "tensorflow"
      }
    ]
  }
}

with all the objects in a list and each objects as string, which is not acceptable for TensorFlow model server config.

Any ideas on how to write it into a file correctly? Or am I creating the whole objects incorrectly? Any help is welcome, Thanks in advance.

1

There are 1 answers

4
Rafael Lerm On BEST ANSWER

I don't know anything about what system will be reading your file, so I can't say anything about how you should write it to a file. It really depends on how the Model Server expects to read it.

That said, I don't see anything wrong with how you're creating the message, or any of the serialization methods you've shown.

  • The print method shows a "text format" proto, which is good for debugging and is sometimes used for storing configuration files. It's not very compact (field names are present in the file) and doesn't have all the backwards- and forwards-compatible features of the binary representation. It's actually funcionally the same as what you've said it "should look like": the colons and commas are actually optional.
  • The SerializeToString() method uses the binary serialization format. This is arguably what Protocol Buffers were built to do. It's a compact representation and provides backwards and forwards compatibility, but it's not very human-readable.
  • As the name suggests, the json_format module provides a JSON representation of the message. That's perfectly good if the system you're interacting with expects a JSON, but it's not exactly common.

Appendix: instead of using print(), the google.protobuf.text_format module has utilities better suited to using the text format programmatically. To write to a file, you could use:

from google.protobuf import text_format
(...)
with open(file_path, 'w') as output:
  text_format.PrintMessage(my_message, output)