In Rust it's possible to get UTF-8 from bytes by doing this:
if let Ok(s) = str::from_utf8(some_u8_slice) {
println!("example {}", s);
}
This either works or it doesn't, but Python has the ability to handle errors, e.g.:
s = some_bytes.decode(encoding='utf-8', errors='surrogateescape');
In this example the argument surrogateescape
converts invalid utf-8 sequences to escape-codes, so instead of ignoring or replacing text that can't be decoded, they are replaced with a byte literal expression, which is valid utf-8
. see: Python docs for details.
Does Rust have a way to get a UTF-8 string from bytes which escapes errors instead of failing entirely?
Yes, via
String::from_utf8_lossy
:If you need more control over the process, you can use
std::str::from_utf8
, as suggested by the other answer. However, there's no reason to double-validate the bytes as it suggests.A quickly hacked-up example:
You could extend this to take values to replace bad bytes with, a closure to handle the bad bytes, etc. For example:
See also: