XmlReader read continually

891 views Asked by At

I have a very large xml file. This is the simplified version of xml format.

<?xml version='1.0' encoding='UTF-8'?>
<Sender>
 <SenderID>571099948</SenderID>
 <Sponsors>
  <Sponsor>
    <SponsorID>TEST01</SponsorID>
    <Contracts>
      <Contract>
        <ContractID>000001</ContractID>
        <Member>
          <SSN>1111111111</SSN>
          <Gender>M</Gender>
          <Benefits>
            <Benefit BenefitType="AAA">
            </Benefit>
            <Benefit BenefitType="BBB">
            </Benefit>
          </Benefits>
        </Member>
        <Member>
          <SSN>4444444444</SSN>
          <Gender>F</Gender>
          <Benefits>
            <Benefit BenefitType="AAA">
            </Benefit>
          </Benefits>
        </Member>
      </Contract>
      <Contract>
        <ContractID>0000002</ContractID>
        <Member>
          <SSN>2222222222</SSN>
          <Gender>F</Gender>
          <Benefits>
            <Benefit BenefitType="CCC">
            </Benefit>
            <Benefit BenefitType="DDD">
            </Benefit>
          </Benefits>
        </Member>
      </Contract>
      <Contract>
        <ContractID>0000003</ContractID>
        <Member>
          <SSN>333333333</SSN>
          <Gender>F</Gender>
          <Benefits> 
            <Benefit BenefitType="CCC">
            </Benefit>
          </Benefits>
        </Member>
      </Contract>
    </Contracts>
  </Sponsor>
  <Sponsor>
    <SponsorID>TEST02</SponsorID>
    <Contracts>
      <Contract>
        <ContractID>0000011</ContractID>
        <Member>
          <SSN>1111111111</SSN>
          <Gender>M</Gender>
          <Benefits>
          </Benefits>
        </Member>
      </Contract>
      <Contract>
        <ContractID>0000002</ContractID>
        <Member>
          <SSN>2222222222</SSN>
          <Gender>F</Gender>
          <Benefits>
          </Benefits>
        </Member>
      </Contract>
    </Contracts>
  </Sponsor>
</Sponsors>
</Sender>

I want get all information of contract node, as well as SponsorID from the parent node. Here is the code to partially read xml file using XmlReader:

        static IEnumerable<XElement> SimpleStreamAxis(string inputUrl, string elementName)      
    {

            using (XmlReader reader = XmlReader.Create(inputUrl))
            {
                reader.MoveToContent();
                while (reader.Read())
                {
                    if (reader.NodeType == XmlNodeType.Element)
                    {
                        if (reader.Name == elementName)
                        {
                            XElement el = XNode.ReadFrom(reader) as XElement;
                            if (el != null)
                            {
                                yield return el;
                            }
                        }
                    }
                }
            }                  
    }

Here is the issue. I cannot use this, because the whole sponsor tree may be too large for the memory.

var sponsor = SimpleStreamAxis(file, "Sponsor");

I cannot use this either, because I cannot tell SponsorID with only Contract node info.

var contract = SimpleStreamAxis(file, "Contract");

Is there a way that I can read the SponsorID in Sponsor, move cursor forward, and read all the Contract nodes under this Sponsor, then move to next Sponsor and read SponsorID and its Contract nodes and so on?

2

There are 2 answers

1
Alexander Petrov On BEST ANSWER

Try this:

using (XmlReader xmlReader = XmlReader.Create("file.xml"))
{
    while (xmlReader.Read())
    {
        if (xmlReader.ReadToFollowing("SponsorID"))
        {
            string sponsorId = xmlReader.ReadElementContentAsString();

            // process SponsorID
            Console.WriteLine(sponsorId);

            if (xmlReader.ReadToFollowing("Contract"))
            {
                do
                {
                    XmlReader contractSubtree = xmlReader.ReadSubtree();
                    XElement contractElement = XElement.Load(contractSubtree);

                    // process Contract
                    Console.WriteLine(contractElement.Element("ContractID"));

                } while (xmlReader.ReadToNextSibling("Contract"));
            }
        }
    }
}
4
dbc On

Yes, this can be done assuming that SponsorID always precedes the Contract nodes.

The basic idea is to read through the XML file until you find elements with the desired names "SponsorID" or"Contract", then yield them for higher processing

    public static IEnumerable<XElement> StreamNamedElements(XmlReader reader, IEnumerable<XName> names)
    {
        var nameSet = new HashSet<XName>(names);

        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.LocalName, reader.NamespaceURI)))
            {
                XElement el = XNode.ReadFrom(reader) as XElement;
                if (el != null)
                    yield return el;
            }
        }
    }

In cases where SponsorID is always present and precedes Contract, this will enumerate through these elements correctly. However, if a sponsor ID is missing or out of order, the sponsor ID from a previous sponsor might get picked up. This error can be trapped by restricting the scope of each "SponsorID" to the containing "Sponsor" element using ReadSubtree():

    public static IEnumerable<XmlReader> StreamNamedSubtrees(XmlReader reader, IEnumerable<XName> names)
    {
        var nameSet = new HashSet<XName>(names);

        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.LocalName, reader.NamespaceURI)))
            {
                var subReader = reader.ReadSubtree();
                yield return subReader;
                ((IDisposable)subReader).Dispose(); // Be sure to advance to the end of the subtree if the caller did not.
            }
        }
    }

And then use it like:

        using (var sr = new StringReader(xml))
        using (var reader = XmlReader.Create(sr))
        {
            foreach (var subReader in StreamNamedSubtrees(reader, new[] { (XName)"Sponsor" }))
            {
                XElement sponsorID = null;
                foreach (var el in StreamNamedElements(subReader, new[] { (XName)"SponsorID", (XName)"Contract" }))
                {
                    if (el.Name == "SponsorID")
                    {
                        sponsorID = el;
                    }
                    else if (el.Name == "Contract")
                    {
                        if (sponsorID == null)
                            throw new InvalidOperationException();
                        // Example "higher processing"
                        Debug.WriteLine(string.Format("{0}: {1}", sponsorID.Value, el.ToString()));
                    }
                }
            }
        }