The Word interop is insanely slow when I try to parse the text in the document with 100+ pages. I re-wrote my code to use the OpenXML SDK which is much faster. My problem is that once I have found the information in OpenXML document I have to locate it then in the Word document and scroll main window to it. In order to accomplish this I have to somehow match OpenXML paragraph to interop paragraph. I thought that interop paragraphs perfectly match openxml paragraphs, but I was wrong. In fact the interop usually have more paragraphs than in OpenXML. Is there any trick or some kind of information which could help me match them? For example I have figured out that usually interop has 1 more empty paragraph after every row in the table. So I could probably use this information and bear it in mind, however I afraid there much more than just 1 case I have found myself.
UPDATE
Here is below screenshots of simple Add-In I have created to demonstrate the difference between interop and openxml paragraphs on the Word document with simple content like this:
The add-in then retrieves the list of interop paragraphs and list of OpenXML paragraphs and show them side-by-side:
Here is below the code I used:
var document = Globals.ThisAddIn.Application.ActiveDocument;
if (document == null)
return;
var interopParagraphs = document
.StoryRanges
.Cast<Range>()
.SingleOrDefault(r => r.StoryType == WdStoryType.wdMainTextStory)
.Paragraphs
.Cast<Paragraph>()
.Select(p => p.Range.Text);
var openXmlDocument = WordprocessingDocument.FromFlatOpcString(document.Content.WordOpenXML);
if (openXmlDocument == null)
return;
var openXmlParagraphs = openXmlDocument
.MainDocumentPart
.Document
.Body
.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>()
.Select(p => p.InnerText);
var compareDialog = new CompareForm(interopParagraphs, openXmlParagraphs);
compareDialog.ShowDialog();
Turning my comment into an answer.
For the case of table rows, you can check to see whether you are looking at an end-of-row paragraph using Range.IsEndOfRowMark.
You can also use Range.Information[WdInformation.wdAtEndOfRowMarker].
Despite the slight difference in the documentation, the range must be collapsed for this property as well. AFAIK, they are equivalent.
I also noticed that this doesn't work if you access a paragraph directly, e.g.e Document.Paragraph[4]. You have to iterate through them for it to work. This does not seem to be documented.