Check if word exists in docx file

265 views Asked by At

I have this docx file loaded in my code:

byte[] documentBytes = File.ReadAllBytes("C:\\mydocument.docx");

This document contains the word "foo" in either the main body, header or footer, what is the easiest way to check for the existence of the word "foo"?

1

There are 1 answers

1
yesman On

Using OpenXML Powertools:

using OpenXmlPowerTools;

...

byte[] documentBytes = GetMyBytes(); // Load the docx file with File.ReadAllBytes, generate a byte array, etc
using var myStream = new MemoryStream(result, false);
using var myDocument = WordprocessingDocument.Open(myStream, false); // myStream can also be replaced with a path in string format

var regex = new Regex("foo");

int headerCount = OpenXmlRegex.Match(document.MainDocumentPart.HeaderParts.SelectMany(x => x.GetXDocument().Descendants(W.p)), regex);
int footerCount = OpenXmlRegex.Match(document.MainDocumentPart.FooterParts.SelectMany(x => x.GetXDocument().Descendants(W.p)), regex);
int bodyCount = OpenXmlRegex.Match(document.MainDocumentPart.GetXDocument().Descendants(W.p), regex);

The variables headerCount, footerCount and bodyCount represent the number of hits for your regex per part of the document. The MainDocumentPart property also contains properties for images, charts, themes etc.