R - transliterating into German alphabet using stri_trans_general()

95 views Asked by At

I have a large number of names, mostly using a German character set, i.e., ASCII plus ä,ö,ü,ß. Some names use special characters (e.g. ğ) which I would like to transliterate into the German version. So, "Özoğuz" should become "Özoguz".

I have tried

stri_trans_general("Özoğuz", "de-ASCII")

but that will result in "Oezoguz" not the desired "Özoguz".

1

There are 1 answers

1
SamR On BEST ANSWER

The de-ASCII rule set translates Ö to Oe. If you want to deviate from this rule but otherwise maintain the German ASCII rule set, the stringi docs state that Custom rule-based transliteration is also supported.

We can define rules which translate (upper and lower case) Ö to a third character, apply the de-ASCII rules to everything else, then translates the third character back to Ö:

id <- "
    Ö > \u2135;
    ö > \u2136;
    :: de-ASCII;
    \u2135 >  Ö;
    \u2136 > ö
"

stringi::stri_trans_general("Özoğuz", id, rules = TRUE)
# [1] "Özoguz"

I have used "ℵ" and "ℶ" for upper and lower case Ö respectively, but any utf-8 characters you are sure will not be in your string should work.