LXML to write in unicode?

1k views Asked by At

I am currently using lxml to write a file. I build the node and then I write it to a file using etree.tostring(node, pretty_print=True). However, it seems to be using htmlencoding --

<Synopsis>
    Abila schlie&#223;lich die ersten sechs Aufgaben zu meistern. Wird der Junge auch 
</Synopsis>

In order to decipher it and get it into the format I want it in, I am currently doing:

>>> print HTMLParser.HTMLParser().unescape('Abila schlie&#223;lich die ersten sechs Aufgaben zu meistern. Wird der Junge auch')
Abila schließlich die ersten sechs Aufgaben zu meistern. Wird der Junge auch

How would I have this write in unicode, or is this not possible with lxml ?

1

There are 1 answers

0
maxymoo On BEST ANSWER

Yes you can pass an encoding to etree.tostring method using the encoding parameter:

etree.tostring(node, pretty_print=True, encoding='unicode')

From the etree.tostring docs:

You can also serialise to a Unicode string without declaration by passing the unicode function as encoding (or str in Py3), or the name 'unicode'. This changes the return value from a byte string to an unencoded unicode string.