Read Parts of an Xml File trough Stream instead of only one

206 views Asked by At

So I've been working on a old piece of code for a project. I've managed to optimize it for 64bit usage. But there's only 1 issue. When using the XmlSerializer.Deserialize It breaks because the input text/Deserialized data is TOO BIG. (overflow/exceeds the 2gb int limit).

I've tried to find a fix, but no answer was helpful.

Here's the code in question.

if (File.Exists(dir + "/" + fileName))
{
    string XmlString = File.ReadAllText(dir + "/" + fileName, Encoding.UTF8);
    BXML_LIST deserialized;
    using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(XmlString)))
    {
        using (XmlTextReader xmlTextReader = new XmlTextReader(input))
        {
            xmlTextReader.Normalization = false;
            XmlSerializer xmlSerializer = new XmlSerializer(typeof(BXML_LIST));
            deserialized = (BXML_LIST)xmlSerializer.Deserialize(xmlTextReader);
        }
    }
    xml_list.Add(deserialized);
}

Following many questions asked here, I tought I could use a method to "split" the xml file (WHILE KEEPING THE SAME TYPE OF BXML_LIST) Then deserialize it and to finish: Combine it to match it's original content to avoid having the overflow error when deserializing the whole file.

Thing is, I have no idea how to implement this. Any help or guidance would be amazing!

// Edit 1:

I've found a piece of code from another site, don't know if it could be a reliable way to combine the splitted xml file:

var xml1 = XDocument.Load("file1.xml");
var xml2 = XDocument.Load("file2.xml");
//Combine and remove duplicates
var combinedUnique = xml1.Descendants("AllNodes")
                          .Union(xml2.Descendants("AllNodes"));
//Combine and keep duplicates
var combinedWithDups = xml1.Descendants("AllNodes")
                           .Concat(xml2.Descendants("AllNodes"));
1

There are 1 answers

0
Alexander Petrov On BEST ANSWER

Your code gives me the creeps, you're so inefficient at using up memory.

string XmlString = File.ReadAllText - Here you load the entire file into memory at the first time.

Encoding.UTF8.GetBytes(XmlString) - Here you spend memory for the same data for the second time.

new MemoryStream(...) - Here you spend memory for the same data for the third time.

xmlSerializer.Deserialize - Here, memory is spent again for deserialized data. But there's no getting away from it.


Write like this

using (XmlReader xmlReader = XmlReader.Create(dir + "/" + fileName))
{
    XmlSerializer xmlSerializer = new XmlSerializer(typeof(BXML_LIST));
    deserialized = (BXML_LIST)xmlSerializer.Deserialize(xmlReader);
}

In this case, xmlSerializer will read data from the file using xmlReader in a stream, in parts.

Perhaps, this may be enough to solve your problem.