how to use unicode character groups in javascript's regexs?

335 views Asked by At

there is a way to use patterns like "\p{L}" in javascript, natively?

(i suppose that is a perl-compatible syntax)

I'm interested firstly in firefox support, and webkit, possibly

4

There are 4 answers

1
Jukka K. Korpela On BEST ANSWER

Unfortunately, no. You can only specify a set of characters in the usual syntax, writing characters and ranges in brackets, but this becomes awkward since e.g. letters are scattered all around the Unicode space, with other characters between them.

There’s an inefficient workaround: fetch the UnicodeData.txt file from the Unicode site, put its content inside your JavaScript code as data, and parse it. And then you could have the data e.g. in an array of objects containing the Unicode properties, such as gc (General Category), which tells you whether the character is a letter or not. But even then, you would just have the data handy for simple testing, not as something you can use as a constituent of a regexp.

In theory, you could use the data to construct a regexp... but it would be rather large.

0
YuS On

No, Javascript has slightly different syntax. To catch unicode you have to use character selector like \uXXXX. However, on practice if your page and files in UTF-8, setting non-ASCII characters in range [абвг] does work too.

http://www.javascriptkit.com/jsref/regexp.shtml

0
slevithan On

No, \p{..} is not supported natively by any of the big browsers. However, it does work in JavaScript if you use the XRegExp library and it's Unicode plugins.

0
Reid Johnson On

The library found here:

http://inimino.org/~inimino/blog/javascript_cset

seems to work for me and is fairly small and independent of other libraries.