Writing compact xml with XmlDictionaryWriter.CreateBinaryWriter and a XmlDictionary

4.8k views Asked by At

I want to write an xml document to disk in a compact format. To this end, I use the net framework method XmlDictionaryWriter.CreateBinaryWriter(Stream stream,IXmlDictionary dictionary)

This method writes a custom compact binary xml representation, that can later be read by XmlDictionaryWriter.CreateBinaryReader. The method accepts an XmlDictionary that can contain common strings, so that those strings do not have to be printed in the output each time. Instead of the string, the dictionary index will be printed in the file. CreateBinaryReader can later use the same dictionary to reverse the process.

However the dictionary I pass is apparently not used. Consider this code:

using System.IO;
using System.Xml;
using System.Xml.Linq;

class Program
{
    public static void Main()
    {
        XmlDictionary dict = new XmlDictionary();
        dict.Add("myLongRoot");
        dict.Add("myLongAttribute");
        dict.Add("myLongValue");
        dict.Add("myLongChild");
        dict.Add("myLongText");

        XDocument xdoc = new XDocument();
        xdoc.Add(new XElement("myLongRoot",
                                new XAttribute("myLongAttribute", "myLongValue"),
                                new XElement("myLongChild", "myLongText"),
                                new XElement("myLongChild", "myLongText"),
                                new XElement("myLongChild", "myLongText")
                                ));

        using (Stream stream = File.Create("binaryXml.txt"))
        using (var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, dict))
        {
            xdoc.WriteTo(writer);
        }
    }
}

The produced output is this (binary control characters not shown)

@
myLongRootmyLongAttribute˜myLongValue@myLongChild™
myLongText@myLongChild™
myLongText@myLongChild™
myLongText

So apparently the XmlDictionary has not been used. All strings appear in their entirety in the output, even multiple times.

This is not a problem limited to XDocument. In the above minimal example I used a XDocument to demonstrate the problem, but originally I stumbled upon this while using XmlDictionaryWriter in conjunction with a DataContractSerializer, as it is commonly used. The results were the same:

[Serializable]
public class myLongChild
{
    public double myLongText = 0;
}
...
using (Stream stream = File.Create("binaryXml.txt"))
using (var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, dict))
{
    var dcs = new DataContractSerializer(typeof(myLongChild));
    dcs.WriteObject(writer, new myLongChild());
}

The resulting output did not use my XmlDictionary.

How can I get XmlDictionaryWriter to use the suplied XmlDictionary?

Or have I misunderstood how this works?

with the DataContractSerializer approach, I tried debugging the net framework code (visual studio/options/debugging/enable net. framework source stepping). Apparently the Writer does attempt to lookup each of the above strings in the dictionary, as expected. However the lookup fails in line 356 of XmlbinaryWriter.cs, for reasons that are not clear to me.

Alternatives I have considered:

  • There is an overload for XmlDictionaryWriter.CreatebinaryWriter, that also accepts a XmlBinaryWriterSession. The writer then adds any new strings it encounters into the session dictionary. However, I want to only use a static dictionary for reading and writing, which is known beforehand.

  • I could wrap the whole thing into a GzipStream and let the compression take care of the multiple instances of strings. However, this would not compress the first instance of each string, and seems like a clumsy workaround overall.

1

There are 1 answers

5
Ondrej Svejdar On

Yes there is a misunderstanding. XmlDictionaryWriter is primarily used for serialization of objects and it is child class of XmlWriter. XDocument.WriteTo(XmlWriter something) takes XmlWriter as argument. The call XmlDictionaryWriter.CreateBinaryWriter will create an instance of System.Xml.XmlBinaryNodeWriter internally. This class has both methods for "regular" writing:

// override of XmlWriter
public override void WriteStartElement(string prefix, string localName)
{
  // plain old "xml" for me please
}

and for dictionary based approach:

// override of XmlDictionaryWriter
public override void WriteStartElement(string prefix, XmlDictionaryString localName)
{
  // I will use dictionary to hash element names to get shorter output
}

The later is mostly used if you serialize object via DataContractSerializer (notice its method WriteObject takes argument of both XmlDictionaryWriter and XmlWriter type), while XDocument takes just XmlWriter.

As for your problem - if I were you I'd make my own XmlWriter:

class CustomXmlWriter : XmlWriter
{
  private readonly XmlDictionaryWriter _writer;
  public CustomXmlWriter(XmlDictionaryWriter writer)
  {
    _writer = writer;
  }
  // override XmlWriter methods to use the dictionary-based approach instead
}

UPDATE (based on your comment)

If you indeed use DataContractSerializer you have few mistakes in your code.

1) POC classes have to be decorated with [DataContract] and [DataMember] attribute, the serialized value should be property and not field; also set namespace to empty value or you'll have to deal with namespaces in your dictionary as well. Like:

namespace  XmlStuff {
  [DataContract(Namespace = "")]
  public class myLongChild
  {
    [DataMember]
    public double myLongText { get; set; }
  }

  [DataContract(Namespace = "")]
  public class myLongRoot
  {
    [DataMember]
    public IList<myLongChild> Items { get; set; }
  }
}

2) Provide instance of session as well; for null session the dictionary writer uses default (XmlWriter-like) implementation:

// order matters - add new items only at the bottom
static readonly string[] s_Terms = new string[]
{
    "myLongRoot", "myLongChild", "myLongText", 
    "http://www.w3.org/2001/XMLSchema-instance", "Items"
};

public class CustomXmlBinaryWriterSession : XmlBinaryWriterSession
{
  private bool m_Lock;
  public void Lock() { m_Lock = true; }

  public override bool TryAdd(XmlDictionaryString value, out int key)
  {
    if (m_Lock)
    {
      key = -1;
      return false;
    }

    return base.TryAdd(value, out key);
  }
}

static void InitializeWriter(out XmlDictionary dict, out XmlBinaryWriterSession session)
{
  dict = new XmlDictionary();
  var result = new CustomXmlBinaryWriterSession();
  var key = 0;
  foreach(var term in s_Terms)
  {
    result.TryAdd(dict.Add(term), out key);
  }
  result.Lock();
  session = result;
}

static void InitializeReader(out XmlDictionary dict, out XmlBinaryReaderSession session)
{
  dict = new XmlDictionary();
  var result = new XmlBinaryReaderSession();
  for (var i = 0; i < s_Terms.Length; i++)
  {
    result.Add(i, s_Terms[i]);
  }
  session = result;
}

static void Main(string[] args)
{
  XmlDictionary dict;
  XmlBinaryWriterSession session;
  InitializeWriter(out dict, out session);

  var root = new myLongRoot { Items = new List<myLongChild>() };
  root.Items.Add(new myLongChild { myLongText = 24 });
  root.Items.Add(new myLongChild { myLongText = 25 });
  root.Items.Add(new myLongChild { myLongText = 27 });

  byte[] buffer;
  using (var stream = new MemoryStream())
  {
    using (var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, dict, session))
    {
      var dcs = new DataContractSerializer(typeof(myLongRoot));
      dcs.WriteObject(writer, root);
    }
    buffer = stream.ToArray();
  }


  XmlBinaryReaderSession readerSession;
  InitializeReader(out dict, out readerSession);
  using (var stream = new MemoryStream(buffer, false))
  {
    using (var reader = XmlDictionaryReader.CreateBinaryReader(stream, dict, new XmlDictionaryReaderQuotas(), readerSession))
    {
      var dcs = new DataContractSerializer(typeof(myLongRoot));
      var rootCopy = dcs.ReadObject(reader);
    }
  }
}