Reading docx files, recognizing and storing italicized text

Question

Reading docx files, recognizing and storing italicized text

2.7k views Asked by Blythe Simmons At 09 June 2015 at 19:07

How should I go about reading a .docx file with Python and being able to recognize the italicized text and storing it as a string?

I looked at the docx python package but all I see is features for writing to a .docx file.

I appreciate the help in advance

Original Q&A

There are 2 answers

Miles Shipman On 11 June 2015 at 00:39

Your best bet is going about unzipping the docx which will create a directory called word. Within that directory is document.xml, from there you would need to learn the xml structure and key words to be able to read just an italicized text. once you complete that all you have to do is pull the text string from xml file.

**ChrisGuest** · Accepted Answer · 2015-06-11T01:53:00+00:00

Here's what my example document, TestDocument.docx, looks like.

enter image description here

Note: The word "Italic" is in Italics, but "Emphasis" uses the style, Emphasis.

If you install the python-docx module. This is a fairly simple exercise.

>>> from docx import Document
>>> document = Document('TestDocument.docx')
>>> for p in document.paragraphs:
...     for run in p.runs:
...             print run.text, run.italic, run.bold
... 
Test Document None None
Italics True None
Emp None None
hasis None None
>>> [[run.text for run in p.runs if run.italic] for p in document.paragraphs]
[[], ['Italics'], []]

The Run.italic attribute captures whether the text is formatted as Italic, but it doesn't know if a text block has a Style that is rendered in Italic, but it can be detected by checking Run.style.name (if you know what styles in your document are rendered in Italics.

>>> [[run.text for run in p.runs if run.style.name=='Emphasis'] for p in document.paragraphs]
[[], [], ['Emp', 'hasis']]

TechQA.

Reading docx files, recognizing and storing italicized text

There are 2 answers

Related Questions in PYTHON

Related Questions in STRING

Related Questions in DOCX

Popular Questions

Popular Tags

Trending Questions