How to check Unicode ranges requiring more than 4 character length using Javascript?

75 views Asked by At

Following the question How to check Unicode input value in JavaScript?, I noticed that Unicode character ranges having more than 4 character length (for example Grantha Unicode Block cannot be captured using the following code;

function checkGrantha(str) {
  return str.split('').some(function(char) {
    var charCode = char.charCodeAt('0')
    return (
      charCode >= 0x11300 && charCode <= 0x1137F 
    )
}) 
}

console.log('');

After some research I found this article where it says that ES6/ES2015 introduced a way to represent Unicode points in the astral planes (any Unicode code point requiring more than 4 chars), by wrapping the code in graph parentheses: '\u{XXXXX}', example '\u{1F436}';

But this cannot be implemented in the above provided code. Is there a way to fix this issue?

1

There are 1 answers

1
PolterGuest On BEST ANSWER
  • First of all, don't use the str.split('') function, it will split the string into 16-bit code units, and this will work incorrectly for characters outside the BMP (i.e., in astral planes); use Array.from(str) instead...

  • Next, for a similar reason, don't use char.charCodeAt(0), but char.codePointAt(0) instead...

function checkGrantha(str)
{
    return Array.from(str).some(function(char) {
    var codePoint = char.codePointAt(0)
    return (
        codePoint >= 0x11300 && codePoint <= 0x1137F
    )
}) 
}
  • Another possibility would be to use a simple regular expression with the 'u' flag:
function checkGrantha(str)
{
    return /[\u{11300}-\u{1137F}]/u.test(str);
}

or:

function checkGrantha(str)
{
    // Warning: this will miss U+1133B COMBINING BINDU BELOW whose Unicode 'Script' property is 'Inherited', not 'Grantha'...
    return /\p{Script=Grantha}/u.test(str);
}
console.log (checkGrantha('')); // -> true