XmlException while deserializing xml file in UTF-16 encoding format

6.8k views Asked by At

Using C#'s XmlSerializer.

In process of deserializing all xml files in a given folder, I see XmlException "There is an error in XML document (0, 0)". and InnerException is "There is no Unicode byte order mark. Cannot switch to Unicode".

All the xmls in the directory are "UTF-16" encoded. Only difference being, some xml files have elements missing that are defined in the class whose object I am using while deserialization.

For example, consider I have 3 different types of xmls in my folder:

file1.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
</ns0:PaymentStatus>

file2.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
<PaymentStatus2 RowNum="1" FeedID="38" />
</ns0:PaymentStatus>

file3.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
<PaymentStatus2 RowNum="1" FeedID="38" />
<PaymentStatus2 RowNum="2" FeedID="39" Amt="26.0000" />
</ns0:PaymentStatus>

I have a class to represent the above xml:

[XmlTypeAttribute(AnonymousType = true, Namespace = "http://my.PaymentStatus")]
[XmlRootAttribute("PaymentStatus", Namespace = "http://http://my.PaymentStatus", IsNullable = true)]
public class PaymentStatus
{

    private PaymentStatus2[] PaymentStatus2Field;

    [XmlElementAttribute("PaymentStatus2", Namespace = "")]
    public PaymentStatus2[] PaymentStatus2 { get; set; }

    public PaymentStatus()
    {
        PaymentStatus2Field = null;
    }
}

[XmlTypeAttribute(AnonymousType = true)]
[XmlRootAttribute(Namespace = "", IsNullable = true)]

public class PaymentStatus2
{

    private byte rowNumField;
    private byte feedIDField;
    private decimal AmtField;
    public PaymentStatus2()
    {
        rowNumField = 0;
        feedIDField = 0;
        AmtField = 0.0M;
    }

    [XmlAttributeAttribute()]
    public byte RowNum { get; set; }

    [XmlAttributeAttribute()]
    public byte FeedID { get; set; }
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public decimal Amt { get; set; }
}

Following snippet does the deserializing for me:

foreach (string f in filePaths)
{
  XmlSerializer xsw = new XmlSerializer(typeof(PaymentStatus));
  FileStream fs = new FileStream(f, FileMode.Open);
  PaymentStatus config = (PaymentStatus)xsw.Deserialize(new XmlTextReader(fs));
}

Am I missing something? It has to be something with encoding format because when I try to manually replace UTF-16 by UTF-8 and that seems to work just fine.

3

There are 3 answers

0
John Oberreuter On

I ran into this same error today working with a third party web service.

I followed Alexei's advice by using a StreamReader and setting the encoding. After that the StreamReader can be used in the XmlTextReader constructor. Here's an implementation of this using the code from the original question:

foreach (string f in filePaths)
{
  XmlSerializer xsw = new XmlSerializer(typeof(PaymentStatus));
  FileStream fs = new FileStream(f, FileMode.Open);
  StreamReader stream = new StreamReader(fs, Encoding.UTF8);
  PaymentStatus config = (PaymentStatus)xsw.Deserialize(new XmlTextReader(stream));
}
0
jonzim On

I don't know if this is the best way, but if my input stream does not contain a BOM I just use XDocument in order to handle different encodings... for example:

public static T DeserializeFromString<T>(String xml) where T : class
    {
        try
        {
            var xDoc = XDocument.Parse(xml);
            using (var xmlReader = xDoc.Root.CreateReader())
            {
                return new XmlSerializer(typeof(T)).Deserialize(xmlReader) as T;
            }
        }
        catch ()
        {
            return default(T);
        }
    }

Of course you'll probably want to throw back any exception, but in the case of the code I copied from I didn't need to know if or why it failed... so I just ate the exception.

0
Alexei Levenkov On

Most likely encoding="utf-16" is unrelated to encoding the XMLs are stored and thus causing parser to fail reading stream as UTF-16 text.

Since you have comment that changing to "encoding" parameter to "utf-8" let you read the text I assume files are actually UTF8. You can easily verify that by opening files as binary instead of text in your editor of choice (i.e. Visual Studio).

Most likely reason to get such mismatch is to save XML as writer.Write(document.OuterXml) (get string representation first which puts "utf-16", but than write string to stream with utf-8 encoding by default).

Possible workaround - to read XML in a way that symmetrical to write code - read as string and than load XML from string.

Proper fix - make sure XML is stored correctly.