Empty value is not actually empty?

Asked by At

I get false checking "AA8187517" string by the regex:

if (/^[a-z0-9]*$/i.test(value))

What I suddenly see in the console?

> value.split('').filter(function(el){ return el != '' })
<  (11) ["", "A", "A", "8", "1", "8", "7", "5", "", "1", "7"]

console

What are these two values in the array?

3 Answers

3
melpomene On Best Solutions

The two "empty" values in your array contain character 8207 (decimal), which is 200f (hex).

U+200F is RIGHT-TO-LEFT MARK in Unicode, an (invisible) marker that changes the direction text is displayed.

Here's a reproduction of your issue plus sample code to remove the character:

let value = "\u200FAA81875\u200F17";

console.log(value.split('').map(function (x) { return x.charCodeAt(0); }));

value = value.replace(/\u200F/g, '');

console.log(/^[a-z0-9]*$/i.test(value));

-2
Mark Minerov On

No you just should save the result in a new variable:

let array = ["text", "text2", '', 'text3'];

let a = array.filter((currentValue) => currentValue);
console.log(a); //["text", "text2", 'text3']
1
Anthony Rutledge On

Four thoughts.

1) I would use explicit regular expressions, where practical, instead of modifiers. Be wary of the * quantifier, as it may allow too much, including the absence of a value!!!

if (/^[A-Za-z0-9]*$/.test(value)) {

}

2) Examine closely the definition of Array.protoype.split(), and what happens when you use the empty string as the separator.

Attention: If an empty string ("") is used as the separator, the string is not split between each user-perceived character (grapheme cluster) or between each unicode character (codepoint) but between each UTF-16 codeunit. This destroys surrogate pairs. See also How do you get a string to a character array in JavaScript? on stackoverflow.

3) Could .trim() be of any use to you here?

value.trim().split('').filter(function(el){ return el != '' })

4) Consider changing your filter predicate (the callback method).

value.trim().split('').filter(function(element){ return /^[A-Za-z0-9]{1}$/.test(element) }

However, investigate the significance of splitting on the empty string and know the encoding of your source strings. Since you are filtering, you should not need to replace before filtering. Filtering alone should be sufficient. You want to whitelist wanted values, as blacklisting by replacing is bound to get you in trouble here.

Stackoverflow: How do you get a string to a character array in JavaScript?