Is there any package for identifying which language is a text in R? I have many rows including text in different languages like "en", "es", "fr", "ja" and so on.. Is it possible to get result with language column like below?
id text language
1 "I am a musician" en
2 "я инженер" ru
3 "Je suis un poète" fr
Or any other possible help to define type of natural language?
Your best shot is probably
cldr
, it uses Chrome's language detection library.However, your examples seems to be a bit too short.
As noted by Ben
textcat
seems to perform better on the shorter examples given by gulnerman, but unlikecldr
it doesn't indicate how reliable the matches are. This makes it difficult to say how much you can trust the results, even though two out of three were correct in this case.