I'm trying to convert the notes attached to an Open Document Presentation file to text, using odfpy. I managed to open the file, make a list of 'notes' objects, managed to extract from that what I believe are paragraphs, and it somehow works, until I try to print notes with special characters (German Umlauts öäü), which cause errors:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 17-19: ordinal not in range(128)
Now I figured out that I'm not the first to encounter an encoding problem, and I'd happily dive into re-encoding the text. My problem is that I don't know how to convert the notes to proper strings. Here is my code:
import sys
from odf.presentation import Notes
from odf.opendocument import load
from odf import text
doc=load(sys.argv[1])
slides=doc.presentation
notes=slides.getElementsByType(Notes)
for page in notes:
pars = page.getElementsByType(text.P)
for p in pars:
print p
I simply iterate over the elements and try to print them, hoping that magically the text from the notes will appear. I have deposited a sample presentation file at https://spideroak.com/browse/share/enno_middelberg/public/public to illustrate the issue.
Can anyone enlighten me how to get the text out of the ODF elements and into a string?
Many thanks,
Enno
str(p)
fails becausep
contains non-ascii text.Use
print unicode(p)