I'm trying to create a regualr expression that does the following transformations:
Apple Orange
>AO
Load Module
>LM
anApple Orange
>O
toLoad Module
>M
I found a suitable pattern, but noticed a strange behavior. Here's my initial try:
/^([A-Z])?[^ ]* ([A-Z])/
Running a replace on the third (and fourth) test case with this expression gives me a surprising result:
'anApple Orange'.replace(/^([A-Z])?[^ ]* ([A-Z])/,'$1$2')
> "Orange"
Why is that surprising? Well, the first group obviously does not match, since the string does not start with a capital letter, but the second group only selects a single capital letter: ([A-Z])
, not everything after it: ([A-Z].*)
To my surprise, adding .*
right after the last capture group gave me the correct result:
'anApple Orange'.replace(/^([A-Z])?[^ ]* ([A-Z]).*/,'$1$2')
> "O"
Why this is happening is beyond my understanding of JS and Regular Expressions. I'm thrilled to know what sort of dark magic is causing a singe [A-Z]
to return multiple, and even some lowercase chracters.
Here's a runnable demo:
var testCases = [
'Apple Orange',
'Load Module',
'anApple Orange',
'toLoad Module'
],
badregex = /^([A-Z])?[^ ]* ([A-Z])/,
goodregex = /^([A-Z])?[^ ]* ([A-Z]).*/;
document.onreadystatechange = function(n){
if (document.readyState === "complete"){
for (var i=0,l=testCases.length; i<l; i++){
var p = document.createElement('p'),
testCase = testCases[i];
p.innerHTML = ""+testCase+" > "+testCase.replace(badregex,'$1$2')
document.body.appendChild(p);
}
document.body.appendChild(document.createElement('hr'));
for (var i=0,l=testCases.length; i<l; i++){
var p = document.createElement('p'),
testCase = testCases[i];
p.innerHTML = ""+testCase+" > "+testCase.replace(goodregex,'$1$2')
document.body.appendChild(p);
}
}
}
I would do like,
Don't complex the things. Just capture all the uppercase chars which exists just after to a space or at the start. And then match all the remaining characters. Now replace all the matched chars by
$1
. Note that all the matched characters are replaced with the chars present inside the replacement part.DEMO
Why?
([A-Z])?
checks for an optional uppercase letter at the start. There is no such thing. So it captures an empty string.[^ ]*
matches zero or more non-space characters.<space>
matches a space.([A-Z])
captures only the first letter in Orange.$1
-> empty string$2
->O
will give youOrange