I am looking for a word tokenizer library for node.js, that supports as many languages as possible. I'd like to pass in a string like: tokenize('Hello, world!', 'en')
and have it return ['Hello', 'world']
. The number of supported languages is more important than precision.
Javascript word tokenizer library with support for multiple languages (as many as possible)
2.5k views Asked by Ognjen At
2
Wink's tokenizer supports two scripts (Latin and Devanagri) and all its languages. Also, it is able to detect language automatically, so, you'll be able to just write:
You can check out the docs at https://winkjs.org/wink-tokenizer/.