Python transliterator that holds the same rules as PHP one

32 views Asked by At

I need a transliterator for python that will be configured the same way as PHP one. My PHP based transliterator is configured with these rules:

$transliterator = Transliterator::createFromRules(
    ':: NFD;'
    . ' :: [:Nonspacing Mark:] Remove;'
    . ' :: NFC;'
    . ' :: [:Punctuation:] Remove;'
    . ' :: Lower();',
    Transliterator::FORWARD
);

At this moment I am using slugify library for python so that I can achieve a close enough result. This duality causes that cross-dependent (between php and python) transliterated texts must be done in PHP's site back-end by using an API endpoint that will return transliterated string.

Is there any way to achieve this?

1

There are 1 answers

0
Andj On

Use PyICU a Python wrapper around icu4c.

Assuming you already have icu4c installed and accessible to Python, install PyICU:

pip install -U PyICU

Syntax is virtually identical between PyICU and PHP. The only real difference is that you need to add a label for the transliterator:

icu.Transliterator.createFromRules(label, rules, direction)

So:

import icu
rules = (
    ':: NFD;'
    ' :: [:Nonspacing Mark:] Remove;'
    ' :: NFC;'
    ' :: [:Punctuation:] Remove;'
    ' :: Lower();'
)
direction = icu.UTransDirection.FORWARD
transliterator = icu.Transliterator.createFromRules("customClean", rules, direction)
s = "Nāgārjuna!"
print(transliterator.transliterate(s))
# nagarjuna

Likewise PyICU will have equivalent functionality to PHP's intl.