How transliterate unicode text with PyICU to ASCII?

2k views Asked by At

There is the PyICU library, which I understand can be used to transliterate strings. However there are no docs. Anyone have a simple example which transliterates a unicode string to ASCII, with PyICU?

The C++ ICU documentation for transliteration is here, but I don't understand how to call it from Python.

2

There are 2 answers

0
Tavian Barnes On BEST ANSWER

There is a nice cheat sheet for PyICU here: https://gist.github.com/dpk/8325992

Here's a slightly modified example:

>>> import icu
>>> tl = icu.Transliterator.createInstance('Any-Latin; Latin-ASCII')
>>> tl.transliterate('Ψάπφω')
'Psappho'
1
happy coder On

From the first link that you gave, I am assuming 1) that you have already built PyICU 2) you have made sure that the library is accessible (see documentation on your linked page if you don't have the above)

I found this documentation from your link:

To convert a Python str encoded in a encoding other than utf-8 to an ICU UnicodeString use the UnicodeString(str, encodingName) constructor.

So you need to find the encodingName, I guess yours would be ASCII (you should check to make sure that it is correct, I haven't bothered)

Then I suppose you would do something like this:

>>> from icu import UnicodeString
 . 
 .
 . 
>>> string = UnicodeString(strToConvert, ASCII)

That is just a quick idea, ymmv. You might want to check the website as it gives more examples and how to do things the "Python way" or the "ICU way". CHEERS!