Reading the body text of an rdocx object with OfficeR

218 views Asked by At

I am trying to read the body of a .docx file with the officer package and I am running into an error:

library(officer)

docx1 <- system.file(package = "officer", "template.docx")
content <- docx_summary(docx1)

Error in x$doc_obj : $ operator is invalid for atomic vectors**

docx2 <- read_docx("template.docx")
content <- docx_summary(docx2)

Error in data.frame(level = as.integer(xml_attr(xml_child(node, "w:pPr/w:numPr/w:ilvl"), : arguments imply differing number of rows: 1, 0**

length(docx1) 
# 1
length(docx2) 
# 37

When I run docx2 I get some interesting information including all the style and then I get this:

text                                  
1.1                                   
Question 10:                          
1.4                                   
Some text here also                   
1.7                                   
Text for a heading                    
1.10                                  
1.13                                  
10.1                                  
1.16                                  
10.2                                  
1.19                                  
2.2                                   
<NA>                                  
2.5                                   
<NA>                                  
2.8                                   
<NA>                                  
2.11                                  
1 of 2 questions correct-50%          

All of the text above is in fact in the body of the text I am trying to read. It is quite scrambled but it's what I am hoping to get in the correct order

0

There are 0 answers