`Backward slash + b` does not work as expected on regex

69 views Asked by At

With JS, it's a known problem that \b does not work well with strings containing special chars (JS engine believes chars like ç are word bondaries).

So I have this code:

"aaa aabb cc ccc abba".replace(/\b((.)\2+)\b|(.)\3+/g,"$1$3");

It correctly returns aaa ab cc ccc aba. However, if the input string has special chars, it does not work anymore, for example:

"ááá áább çç ççç ábbá".replace(/\b((.)\2+)\b|(.)\3+/g,"$1$3");

The code above returns á ább ç ç ábbá which is not expecetd, it should have been ááá áb çç ççç ábá.

So I decided I didnt want to use \b anymore because I will only accept word boundaries as (space) and begginig/end of string (^ or $). So I tried this regex:

"ááá áább çç ççç ábbá".replace(new RegExp("(^| )((.)\\3+)( |$)|(.)\\5+","g"),"$1$2$4$5");

It returned ááá áb çç ç ábá which is almost correct, it should have returned ááá áb çç ççç ábá.

How can I make the last regex work without using lookheads, lookbehind, lookaround... Is there an easy fix to the last regex? Or, is there a fix to the \b that makes \b work as expected?

0

There are 0 answers