The man pages for uconv
say:
-x transliteration
Run the given transliteration on the transcoded Unicode data, and use the transliterated data as input for the transcoding to the the destination encoding.
It also includes the following two examples:
echo '\u30ab' | uconv -x 'hex-any; any-name'
uconv -f utf-8 -t utf-8 -x '::nfkc; [:Cc:] >; ::katakana-hiragana;'
The first example points towards the -x
option defining a "compound transform" but the second example points to it being a "rule-based transliterator".
This is exacerbated by the fact that many of ICU's provided examples (1, 2) don't work:
$ echo "Example" | uconv -f UTF8 -t UTF8 -x 'NFD; [:Nonspacing Mark:] Remove; NFC;'
Couldn't create transliteration "NFD; [:Nonspacing Mark:] Remove; NFC;": U_MISSING_OPERATOR, line 0, offset 0.
$ echo "Example" | uconv -f UTF8 -t UTF8 -x '[:Latin:]; NFKD; Lower; Latin-Katakana;'
Couldn't create transliteration "[:Latin:]; NFKD; Lower; Latin-Katakana;": U_MISSING_OPERATOR, line 0, offset 0.
But some examples (1, 2) work just fine:
$ echo "Example" | uconv -f UTF8 -t UTF8 -x '[aeiou] Upper'
ExAmplE
$ echo "Example" | uconv -f UTF8 -t UTF8 -x 'NFKD; Lower; Latin-Katakana;'
エクサンプレ
So what the heck does -x
define?
The plot thickens! It looks like uconv
chokes on predefined character classes that aren't in a transform rule.
Regular character classes:
$ echo "Example" | uconv -f UTF8 -t UTF8 -x '[a-zA-Z] Upper'
EXAMPLE
$ echo "Example" | uconv -f UTF8 -t UTF8 -x ':: [a-zA-Z] Upper;'
EXAMPLE
Predefined character classes:
$ echo "Example" | uconv -f UTF8 -t UTF8 -x '[:alpha:] Upper'
Couldn't create transliteration "[:alpha:] Upper": U_MISSING_OPERATOR, line 0, offset 0.
$ echo "Example" | uconv -f UTF8 -t UTF8 -x ':: [:alpha:] Upper;'
EXAMPLE
Just in case, here's the version of uconv
I'm using:
$ uconv --version
uconv v2.1 ICU 58.1
It does different things depending on what you pass.
The excerpt below is formatted code from
uconv.cpp
.translit
is the value of the-x
argument.And
createFromRules
further differs in what it creates based on the input: