How to migrate this regex to JavaScript

77 views Asked by At

I have this regex that works perfectly with PHP:

$que = preg_replace("/(?<=^| )([a-z])\\1*(?= |$)\K|([a-z])(?=\\2)/","$3",$que);

This regex removes repeated chars inside strings (for example, axxd becomes axd however xxx will still become xxx). The problem that I am facing, is because it does not work with JS (I think the negative lookbehind does not work with JS).

More examples:

  1. the string aaa baaab xxx would become aaa bab xxx
  2. the string ahhj aaab cc iiik would become ahj ab cc ik

Do you have a solution for this that is at least a little efficient? I mean, I will probably use this regex on a string with 1k chars, so if the regex is not efficient, the browser may freeze.

3

There are 3 answers

2
Nick On

The negative lookbehind is not likely to be your issue as they are supported on almost all current release browsers. However JavaScript regex doesn't recognise \K as a meta sequence but rather as a literal K. You can work around that using this regex:

\b([a-z])\1+(?!\1|\b)|(?<=([a-z]))((?!\2)[a-z])\3+

This matches either \b([a-z])\1+(?!\1|\b):

  • \b : word boundary
  • ([a-z]) : a letter, captured in group 1
  • \1+ : one or more repetitions of the captured letter
  • (?!\1|\b) : lookahead assertion that the next location is not another repetition of the captured letter or a word boundary

or (?<=([a-z]))((?!\2)[a-z])\3+:

  • (?<=([a-z])) : a positive lookbehind for a letter, captured in group 2
  • ((?!\2)[a-z]) : another letter which is not the same as the previously captured letter, captured in group 3
  • \3+ : one of more repetitions of the captured letter

The first part of the regex will capture repeated letters at the beginning of a word; the second part captures repeated letters in the middle or at the end of a word.

You can then replace with $1$3 which will replace any repeated letter matched by the regex with just a single copy of itself.

Regex demo on regex101

In JavaScript:

console.log('aaa baaab xxx fjjj'.replace(/\b([a-z])\1+(?!\1|\b)|(?<=([a-z]))((?!\2)[a-z])\3+/g, '$1$3'))
console.log('ahhj aaab cc iiik'.replace(/\b([a-z])\1+(?!\1|\b)|(?<=([a-z]))((?!\2)[a-z])\3+/g, '$1$3'))
console.log('bbb ahhj aaab cc iiik xxx fjjj baaaaaab yyyaaa'.replace(/\b([a-z])\1+(?!\1|\b)|(?<=([a-z]))((?!\2)[a-z])\3+/g, '$1$3'))

PHP demo on 3v4l.org

3
Casimir et Hippolyte On
let result = str.replace(/\b((.)\2+)\b|(.)\3+/g, '$1$3');

demo

0
The fourth bird On

You could rewrite (?<=^| ) as (?:\s|^) and keep that match in the replacement instead of using \K which is not supported in JavaScript.

You could write the pattern as:

((?:\s|^)([a-z])\2+)(?=\s|$)|([a-z])(?=\3)

The pattern matches:

  • ( Capture group 1
    • (?:\s|^) Match either a whitespace char or assert the start of the string
    • ([a-z])\2+ Capture a single char a-z in group 2 and repeat matching that same char 1 or more times
  • ) Close group 1
  • (?=\s|$) Positive lookahead, assert either a whitespace char or the end of the string to the right
  • | Or
  • ([a-z])(?=\3) Capture a single char a-z in group 3 while asserting the same character directly to the right

Regex demo

const regex = /((?:\s|^)([a-z])\2+)(?=\s|$)|([a-z])(?=\3)/g;

[
  "aaa baaab xxx",
  "ahhj aaab cc iiik",
  "#$aa$aa#aaa bbb"
].forEach(s =>
  console.log(s.replace(regex, "$1"))
)


If you want to match any letter:

const regex = /((?: |^)([\p{L}\p{M}])\2+(?= |$))|([\p{L}\p{M}])(?=\3)/gu;

See another regex demo