Protobuf lazy decoding of sub message

2.3k views Asked by At

I am using proto 3 (java) in my projects . I have some huge protobufs embedded with smaller messages . Is there a way I can acheive partial decoding of only few nested sub messages that I want to look at. The current issue I am having is I need to join this huge proto based record data with another records ,but my join are based on very small sub messages ,so I don't want to decode the entire huge protobuf and be able to only decode the nested message (string id) to join and then only decode the entire protobuf for the joined data.

I tried using the [lazy=true] tagging method , but I don't see any difference in generated code , also I tried benchmarking the deserialization time with and without the lazy key work and it didn't seem to affect at all . Is this feature by default on for all fields? Or is this even possible? I do see there are few classes LazyFields.java and test cases in the protobuf-github so I assume this feature has been implemented.

1

There are 1 answers

0
kalyanswaroop On

For those that happen to look at this conversation later and finding it hard to understand, here's what Marc's talking about:

If your object is something like

message MyBigMessage{
  string id = 1;
  int sourceType = 2 ;
  And many other fields here, that would be expensive to parse .......

}

And you get a block of bytes that you have to parse. But you want to only parse messages from a certain source and maybe match a certain id range. You could first parse those bytes with another message as:

message MyFilterMessage{
  string id = 1; //has to be 1 to match
  int sourceType = 2 ; //has to be 1 to match
  And NOTHING ELSE here.......
}

And then, you could look at sourceType and id. If they match whatever you are filtering for, then, you could go and parse the bytes again, but this time, using MyBigMessage to parse the whole thing.

One other thing to know: FYI: As of 2017, lazy parsing was disabled in Java (except MessageSet) according to this post: https://github.com/protocolbuffers/protobuf/issues/3601#issuecomment-341516826 I dont know the current status. Too lazy to try to find out ! :-)