Restrict string length in protobuf?

4.2k views Asked by At

I'm trying to implement limits on my protobuf messages that can be shared amongst different clients that use different languages

Amongst other things, I would like to implement restrictions on string length such as a minimum or maximum length, or a range of min and max values for an int32 var

Is there anyway to implement such requirements? Thanks a lot for your help!

2

There are 2 answers

4
Marc Gravell On

That isn't something that is inbuilt into any implementation that I'm aware of, if you mean the serializer checking the length before serialization and/or after deserialization. You would have to check the data yourself.

In theory it could be added without massive effort, but a bit like required: it would end up being quite hostile to usage. IMO it isn't a significant omission, as noted by the reality that it also doesn't exist in most general purpose serilizers.

0
bazza On

This is something that is sorely missing from GPB. Constraints could be added to the schema language and code generators, but it would not touch the wireformat at all. I really, really wish Google would add this to GPB, because then it would be close to perfect.

By constraints, there's four aspects that a schema language needs:

  1. The ability to define length of arrays / lists (e.g. strings)
  2. The ability to constraint the range of a value
  3. The ability to define the constraint parameters as constants that are also built into the generated code (so that programs can know what the constraint is and use them in, say, for loops)
  4. The ability for a constraint to be defined in terms of a constant parameter, but + or - 1 (again, for the benefit of iterating over things in the program)

Other serialisation standards do support constraints to varying degrees, namely JSON, XML (XSD) schema, and the granddaddy of them all ASN.1.

JSON schemas are, so far as I can tell, normally used to verify that a JSON message is correct; but that relies on the developer actually building that check into their code.

XSD schemas in principle can express constraints quite usefully, the problem is finding tool sets that actually build useful code. For example, Microsoft's XSD.exe is a very primitive code generator for XSD, and totally ignores any constraints expressed in the XSD schema.

For me, the one that wins is ASN.1. It's old, but there are some very modern tools and wireformats for it (yes, it can speak XML and JSON these days), with support for C, C++, C#, Java, Go, Python plus a few more if you dig around. ASN.1 constraints fulfil all four of those features above, e.g.

---Define an integer constant, value 10
maxValue INTEGER ::= 360
---Define an integer message type, constrained
Bearing ::= INTEGER (0..<maxValue)

That short snippet will give you a Bearing datatype, and if you try and serialise it with a value outside of the range 0..359, the serialiser will return an error (as will the deserialiser if it ever encounters malformed data). The < in the Bearing definition is the -1, and the maxValue being 360 means you can write sensible for loops such as for (int az = 0; az < maxValue; az++). Having the constant defined in the schema is super useful, because it's a single point of defintion and if you want to change its value during a development project, you change it in only one place and recompile. It leads to a very agile way of defining interfaces.

---Length constraint
maxLen INTEGER ::= 10
---Constrained length string
Name ::= IA5STRING (SIZE(1..maxLen))
---Another constrained length string
Surname ::= IA5STRING (SIZE(1,5<..<maxLen,100))

Name ends up having to be between 1 and 10 characters long. Surname has to be either 1, 6 to 9, or 100 characters long.

And, believe it or not, this is just a tiny taste of what constraints in ASN.1 can do. They can also theoretically include regular expressions. Understandably then, implementing all of ASN.1's constraints in code generators is a bit of a nightmare, which is why the best tools for ASN.1 are commercial and not very cheap, but the end result in a big complicated system development involving a variety of languages and platforms is very useful.

Sadly, Google protocol buffers doesn't do any of this.

ASN.1 also allows you to define constant values of any message type (INTEGER, REAL, STRING, constructed complex types, etc).

Best You Can Do with GPB

The best you can do with GPB is to comment up the .proto file and hope the developers read it.

One trick is to deliberately have a syntax error in the schema, so that it won't compile. That will at least make your fellow developers read the broken part of the schema, and hopefully take heed of any comments you've put there. If you want to change the value, break the schema again to make them read it again.

Your system might mean that the constraint can be another int field in the message, and that only one part of the system need set it and hopefully the rest of it can use it.