It is possible to discard items that fail to parse and continue parsing with serde_json?

59 views Asked by At

I have a very large JSON file. Most of it is valid JSON data, but parts of it are not. The following is a simplification of my case:

[
    "this is valid: \ud835\udc47",
    "this is invalid: \ud835",
]

The first item is valid and will be successfully parsed, but when the second item is attempted the deserialization will fail because UTF-8 doesn't allow the \ud835 character at all while UTF-16 doesn't allow a lone \ud835 character as it needs to be followed by another hex escape.

This issue has occurred when using a HTTP server that uses Python's built-in JSON deserializer and saved the data to a database. Python's deserializer accepted a lone "\ud835" character which is not valid UTF-8 or UTF-16. Now when we want to migrate this application and database to Rust with serde it catches this invalid UTF-8/16 string.

0

There are 0 answers