Regex for complex uppercase-lowercase scenarios

Question

Regex for complex uppercase-lowercase scenarios

149 views Asked by Cassio Polegatto At 31 December 2022 at 14:24

I'm working on an app that adapts text to braille specifications and it has some tricky rules on how to handle uppercase, I'd like some help. The rules are:

Before a single uppercase letter, add ":"

:This is an :Example

Before multiple uppercase letters and all caps words add another ":"

:This is ::ANOTHER ex::AMple, ::ALRIGHT

If a sequence of uppercase words is made of more than three uppercase words in a row, add "-" to the beggining of the sequence and delete all other "::" within that sequence, except for the last one

:This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example

Finally, if it goes from uppercase to lower case mid word (except when first capitalized letters), add ";"

:This is my fin:A;l ::EXAM;ple

Working with regex, I was able to solve for the simple ones but not all rules.

// adds : before any uppercase
   var firstChange = text.replace(/[A-Z]+/g,':$&'); 

// adds : to double+ uppercase    
   var secondChange = firstChange.replace(/[([A-Z]{2,}/g,':$&'); 

// adds ; to upper-lower change
   var thirdChange = secondChange.replace(/\B[A-Z]+(?=[a-z]/g,'$&;')

I was trying to build up from simple to complex, then I tried the other way around, then I tried merging some rules, either way they conflict. I'm new to regex and I could use any insight on how to solve this. Thank you.

Edit: To make it more clear, I made a final example that combines all rules.

This is an Example. This is ANOTHER exAmple, ALRIGHT? This is A VERY LONG SENTENCE WITH A SEQUENCE OF ALL CAPS to serve AS AN Example. This is my finAl EXAMple.

Should become:

:This is an :Example. :This is ::ANOTHER ex::AM;ple, ::ALRIGHT? :This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example. :This is my fin:A;l ::EXAM;ple

SOLVED: With the help of @ChrisMaurer and @SaSkY, here is the code to solve the above problem:

(edit: fixed fourth change thanks to @Sasky)

var original = document.getElementById("area1");
var another = document.getElementById("area2");

function MyFunction(area1) {

  // include : before every uppercase
  var firstChange = original.value.replace(/[A-Z]+/g, ':$&');

  // add one more : before multiple uppercase letters
  var secondChange = firstChange.replace(/([([A-Z]{2,}|\b[|A-Z]+\b)/g, ':$&');

  // add - to beggining of long uppercase sequence
  var thirdChange = secondChange.replace(/\B(::[A-Z]+(\s+::[A-Z]+){3,})/g, '-$&');

  // removes extra :: before words within long uppercase sequence
  var fourthChange = thirdChange.replace(/(?<=-::[A-Z]+\s(?:::[A-Z]+\s)*)::(?=[A-Z]+\s)(?![A-Z]+\s(?!::[A-Z]+\b))/g, '');

  // add a lowercase symbol when it changes from uppercase to lowercase mid word
  var fifthChange = fourthChange.replace(/\B[A-Z](?=[a-z])/g, '$&;');

  // update
  area2.value = fifthChange;
}

<html>
<body>
<textarea id="area1"  rows="4" cols="40" onkeyup="MyFunction()">
</textarea>
<textarea id="area2" rows="4" cols="40"></textarea>
</body>
</html>

Original Q&A

There are 1 answers

**Chris Maurer** · Accepted Answer · 2022-12-31T17:10:54+00:00

So I think your approach is good, and the first replace seems to get the single colons into the right place. The second one screws up on single letter words like A and I. I would fix that with an added alternation:

/([([A-Z]{2,}|\b[A-Z]+\b)/g

Now you need to add two more replacements; one to add the hyphen, and the other to remove the double colons.

For the hyphen you just search for three or more ::ALLCAPS whitespace combos like this:

/\B(::[A-Z]+(\s+::[A-Z]+){2,})/g

The \B handles caps at the very beginning of the string. I replaced with hyphen and $1.

To remove the double colons, I got a little trickier with a lookbehind and a lookahead:

/(?<=::[A-Z]+\s*)::([A-Z]+)(?=\s*::[A-Z]+)/g

This one is just replaced with $1. Luckily Javascript supports variable length lookbehinds.

Here it is working on Regex101:

I did not look at your last replacement. Superficially it seemed to be OK.

TechQA.

Regex for complex uppercase-lowercase scenarios

There are 1 answers

Related Questions in JAVASCRIPT

Related Questions in REGEX

Related Questions in UPPERCASE

Related Questions in LOWERCASE

Related Questions in BRAILLE

Popular Questions

Trending Questions