I'm trying to create a regualr expression that does the following transformations:
Apple Orange>AOLoad Module>LManApple Orange>OtoLoad Module>M
I found a suitable pattern, but noticed a strange behavior. Here's my initial try:
/^([A-Z])?[^ ]* ([A-Z])/
Running a replace on the third (and fourth) test case with this expression gives me a surprising result:
'anApple Orange'.replace(/^([A-Z])?[^ ]* ([A-Z])/,'$1$2')
> "Orange"
Why is that surprising? Well, the first group obviously does not match, since the string does not start with a capital letter, but the second group only selects a single capital letter: ([A-Z]), not everything after it: ([A-Z].*)
To my surprise, adding .* right after the last capture group gave me the correct result:
'anApple Orange'.replace(/^([A-Z])?[^ ]* ([A-Z]).*/,'$1$2')
> "O"
Why this is happening is beyond my understanding of JS and Regular Expressions. I'm thrilled to know what sort of dark magic is causing a singe [A-Z] to return multiple, and even some lowercase chracters.
Here's a runnable demo:
var testCases = [
'Apple Orange',
'Load Module',
'anApple Orange',
'toLoad Module'
],
badregex = /^([A-Z])?[^ ]* ([A-Z])/,
goodregex = /^([A-Z])?[^ ]* ([A-Z]).*/;
document.onreadystatechange = function(n){
if (document.readyState === "complete"){
for (var i=0,l=testCases.length; i<l; i++){
var p = document.createElement('p'),
testCase = testCases[i];
p.innerHTML = ""+testCase+" > "+testCase.replace(badregex,'$1$2')
document.body.appendChild(p);
}
document.body.appendChild(document.createElement('hr'));
for (var i=0,l=testCases.length; i<l; i++){
var p = document.createElement('p'),
testCase = testCases[i];
p.innerHTML = ""+testCase+" > "+testCase.replace(goodregex,'$1$2')
document.body.appendChild(p);
}
}
}
I would do like,
Don't complex the things. Just capture all the uppercase chars which exists just after to a space or at the start. And then match all the remaining characters. Now replace all the matched chars by
$1. Note that all the matched characters are replaced with the chars present inside the replacement part.DEMO
Why?
([A-Z])?checks for an optional uppercase letter at the start. There is no such thing. So it captures an empty string.[^ ]*matches zero or more non-space characters.<space>matches a space.([A-Z])captures only the first letter in Orange.$1-> empty string$2->Owill give youOrange