I have a class called Reader
public class Reader
Here is the constructor
public Reader(string fileName)
{
using (Package package = Package.Open(AppDomain.CurrentDomain.BaseDirectory + "\\" + fileName + ".docx"))
{
Document = new XmlDocument();
Document.Load(package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream());
xmlNamespaceManager = new XmlNamespaceManager(Document.NameTable);
xmlNamespaceManager.AddNamespace("w", @"http://schemas.microsoft.com/office/word/2006/wordml");
}
}
There's also a public method called ReadTextNodes, which I have set up to test.
public void ReadTextNodes()
{
var nodes = Document.SelectNodes("//w:t", xmlNamespaceManager);
Console.WriteLine(nodes.Count);
foreach (XmlNode node in nodes)
{
Console.WriteLine(node.InnerText);
}
}
The Xpath I have used is "//w:t" - I have linked this up to the XML Namespace "w" used by Word ( "http://schemas.microsoft.com/office/word/2006/wordml" ) Yet, this query gives me zero nodes. When I replace with "//*" , the Console fills up very quickly with text. So what's wrong with the first query?
I figured out I was using the wrong Schema. I saved the docx file as an XML file and opened up in Visual Studio to find that "w" is actually mapped to "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
And not to "http://schemas.microsoft.com/office/word/2006/wordml"