I need to serialize a class of structs according to the TLV format with Serde. TLV can be nested in a tree format.
The fields of these structs are serialized normally, much like bincode
does, but before the field data I must include a tag (to be associated, ideally) and the length, in bytes, of the field data.
Ideally, Serde would recognize the structs that need this kind of serialization, probably by having them implement a TLV
trait. This part is optional, as I can also explicitly annotate each of these structs.
So this question breaks down in 3 parts, in order of priority:
How do I get the length data (from Serde?) before the serialization of that data has been performed?
How do I associate tags with structs (though I guess I could also include tags inside the structs..)?
How do I make Serde recognize a class of structs and apply custom serialization?
Note that 1) is the (core) question here. I will post 2) and 3) as individual questions if 1) can be solved with Serde.
Brace yourself, long post. Also, for convention: I'm picking both type and length to be unsigned 4 byte big endian. Let's start with the easy stuff:
That's really a separate question, but you can either do that via the
#[serde(serialize_with = …)]
attributes, or in your serializer'sfn serialize_struct(self, name: &'static str, _: usize)
based on the name, depending on what exactly you have in mind.This is a known limitation of serde, and the reason protobuf implementations typicall aren't based on serde (take e.g.
prost
), but have their own derive proc macros that allow to annotate structs and fields with the respective tags. You should probably do the same as it's clean and fast. But since you asked about serde, I'll pick an alternative inspired byserde_protobuf
: if you look at it from a weird angle, serde is just a visitor-based reflection framework. It will provide you with structure information about the type you're currently (de-)serializing, e.g. it'll tell you type and name and fields of the type your visiting. All you need is a (user-supplied) function that maps from this type information to the tags. For example:Then, you need to write a function that supplies the tags, e.g. something like:
If you only have one set of type → tag mappings, you could also put it into the serializer directly.
The short answer is: Can't. The length can't be known without inspecting the entire structure (there could be Vecs in it, e.g.). But that also tells you what you need to do: You need to inspect the entire structure first, deduce the length, and then do the serialization. And you have precisely one method for inspecting the entire structure at hand: serde. So, you'll write a serializer that doesn't actually serialize anything and only records the length:
Fortunately, serialization is non-destructive, so you can use this first serializer to get the length, and then do the actual serialization in a second pass:
Since you already know the length of what you're serializing, the second serializer is relatively straightforward:
The only snag you may hit is that the
TLVLenVisitor
only gave you one length. But you have many TLV-structures, recursively nested. When you want to write out one of the nested structures (e.g. a Vec), you just run theTLVLenVisitor
again, for each element.Playground
This also means that you may have to do many passes over the structure you're serializing. This might be fine if speed is not of the essence and you're memory-constrained, but in general, I don't think it's a good idea. You may be tempted to try to get all the lengths in the entire structure in a single pass, which can be done, but it'll either be brittle (since you'd have to rely on visiting order) or difficult (because you'd have to build a shadow structure which contains all the lengths).
Also, do note that this approach expects that two serializer invocations of the same struct traverse the same structure. But an implementer of
Serialize
is perfectly capable to generating random data on the fly or mutating itself via internal mutability. Which would make this serializer generate invalid data. You can ignore that problem since it's far-fetched, or add a check to theend
call and make sure the written length matches the actual written data.Really, I think it'd be best if you don't worry about finding the length before serialization and wrote the serialization result to memory first. To do so, you can first write all length fields as a dummy value to a
Vec<u8>
:Then after you serialize the content and know its length, you can overwrite the dummies:
Playground. And there you go, single pass TLV serialization with serde.