Extracting text from Open Document file generates UnicodeEncodeError

Question

Extracting text from Open Document file generates UnicodeEncodeError

728 views Asked by mcenno At 29 November 2013 at 13:33

I'm trying to convert the notes attached to an Open Document Presentation file to text, using odfpy. I managed to open the file, make a list of 'notes' objects, managed to extract from that what I believe are paragraphs, and it somehow works, until I try to print notes with special characters (German Umlauts öäü), which cause errors:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 17-19: ordinal not in range(128)

Now I figured out that I'm not the first to encounter an encoding problem, and I'd happily dive into re-encoding the text. My problem is that I don't know how to convert the notes to proper strings. Here is my code:

import sys
from odf.presentation import Notes
from odf.opendocument import load
from odf import text

doc=load(sys.argv[1])
slides=doc.presentation
notes=slides.getElementsByType(Notes)

for page in notes:
    pars = page.getElementsByType(text.P)
    for p in pars:
        print p

I simply iterate over the elements and try to print them, hoping that magically the text from the notes will appear. I have deposited a sample presentation file at https://spideroak.com/browse/share/enno_middelberg/public/public to illustrate the issue.

Can anyone enlighten me how to get the text out of the ODF elements and into a string?

Many thanks,

Enno

Original Q&A

There are 1 answers

**alexanderlukanin13** · Accepted Answer · 2013-11-29T13:46:10+00:00

alexanderlukanin13 On 29 November 2013 at 13:46 BEST ANSWER

str(p) fails because p contains non-ascii text.

Use print unicode(p)

TechQA.

Extracting text from Open Document file generates UnicodeEncodeError

There are 1 answers

Related Questions in PYTHON

Related Questions in ENCODING

Related Questions in ODF

Popular Questions

Popular Tags

Trending Questions