Rust serde: Deserializing XML when order matters

336 views Asked by At

I am building a xml parser in rust using the serde-xml-rs crate. I have some XML, that looks something like this:

<CharacterStyleRange>
    <Properties>
        <SomeProperty>value</SomeProperty>
    </Properties>
    <Content>First line</Content>
    <Br />
    <Content>Second line</Content>
    <Br />
    <Content>Third line</Content>
</CharacterStyleRange>

Since the order of <Content> and <Br\> matters, i am trying to parse it into an Vector of Enums the following way:

#[derive(Default,Deserialize,Debug)]
#[serde(rename_all="PascalCase")]
pub struct CharacterStyleRange {
    properties: Option<Properties>,
    #[serde(alias="Content", alias="Br")]
    contents: Option<Vec<ContentOrLineBreak>>
}

#[derive(Deserialize,Debug)]
enum ContentOrLineBreak {
    Content(Content),
    Br(Br)
}

#[derive(Default,Deserialize,Debug)]
#[serde(rename_all="PascalCase")]
pub struct Content {
    #[serde(rename="$value")]
    text: String,
}

#[derive(Default,Deserialize,Debug)]
pub struct Br {
    text: String,
}

I am expecting to get a structure somewhat like this:

CharacterStyleRange {
    properties: Some(Property{
        SomeProperty: value
    }),
    contents: Some(Vec<
        ContentOrLineBreak::Content(
            Content{ text: "First line" }
        ),
        ContentOrLineBreak::Br(
            Br{ text="" }
        ),

        ContentOrLineBreak::Content(
            Content{ text: "Second line" }
        ),
        ContentOrLineBreak::Br(
            Br{ text="" }
        ),

        ContentOrLineBreak::Content(
            Content{ text: "Third line" }
        ),
    >)
}

However, I get the following error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { field: "unknown variant `First line`, expected `Content` or `Br`" }'

Is this approach possible only using the derived serde Deserialization function? Or should i make a custom function for parsing this structure?

I am also considering changing the <Br/> elements in the xml to be <Content><Br/></Content> before parsing, in order to avoid this issue all together.

1

There are 1 answers

0
yw07 On

You could use quick_xml::de::from_str, In cargo.toml,

quick-xml = {version="0.30.0", features =["overlapped-lists", "serialize"] }

And then

use serde::Deserialize;

#[derive(Debug, Clone, Deserialize)]
#[serde(rename_all = "PascalCase")]
pub struct CharacterStyleRange {
    #[serde(rename = "Properties")]
    properties: Option<Properties>,
    #[serde(rename = "$value")]
    contents: Vec<ContentOrLineBreak>,
}

#[derive(Debug, Clone, Deserialize)]
#[serde(rename_all = "PascalCase")]
pub struct Properties {
    #[serde(rename = "SomeProperty")]
    some_property: String,
}

#[derive(Debug, Clone, Deserialize)]
enum ContentOrLineBreak {
    Content(String),
    Br,
}

#[test]
fn test_serde_when_order_matters() {
    let xml = r#"
    <CharacterStyleRange>
        <Properties>
            <SomeProperty>value</SomeProperty>
        </Properties>
        <Content>First line</Content>
        <Br />
        <Content>Second line</Content>
        <Br />
        <Content>Third line</Content>
    </CharacterStyleRange>
    "#;
    let character_style_range =  quick_xml::de::from_str::<CharacterStyleRange>(xml).unwrap();
    print!("{:#?}", character_style_range);
}