c# Problem with my Xpath? Parsing Xml from a DocX file using Package.GetPart

282 views Asked by At

I have a class called Reader

public class Reader

Here is the constructor

public Reader(string fileName)
        {
            using (Package package = Package.Open(AppDomain.CurrentDomain.BaseDirectory + "\\" + fileName + ".docx"))
            {
                Document = new XmlDocument();
                Document.Load(package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream());
                xmlNamespaceManager = new XmlNamespaceManager(Document.NameTable);
                xmlNamespaceManager.AddNamespace("w", @"http://schemas.microsoft.com/office/word/2006/wordml");
            }
        }

There's also a public method called ReadTextNodes, which I have set up to test.

public void ReadTextNodes()
        {
            var nodes = Document.SelectNodes("//w:t", xmlNamespaceManager);
            Console.WriteLine(nodes.Count);
            foreach (XmlNode node in nodes)
            {
                Console.WriteLine(node.InnerText);
            }
        }

The Xpath I have used is "//w:t" - I have linked this up to the XML Namespace "w" used by Word ( "http://schemas.microsoft.com/office/word/2006/wordml" ) Yet, this query gives me zero nodes. When I replace with "//*" , the Console fills up very quickly with text. So what's wrong with the first query?

1

There are 1 answers

0
LeiMagnus On

I figured out I was using the wrong Schema. I saved the docx file as an XML file and opened up in Visual Studio to find that "w" is actually mapped to "http://schemas.openxmlformats.org/wordprocessingml/2006/main"

And not to "http://schemas.microsoft.com/office/word/2006/wordml"