Confusion regarding the *? regular expression operator

172 views Asked by At

So I want to search a string, using the below regular expression:

border-.*\.5pt

to find all border-top, border-bottom, etc CSS properties in a file with a border thickness of .5pt. It generally works great, but it's too greedy.

For example all of the below comes back as a single match:

border-top:solid #1F497D .5pt;border-bottom:solid #1F497D .5pt

I want those two CSS properties to be two separate matches.

So I tried to modify my regular expression to:

border-.*?\.5pt

Using ? to make it non-greedy. However, after that modification, nothing matches.

Can anyone explain why I see this behavior? What am I missing?

(If it's worth knowing, I'm using Microsoft Expression Web's 'find with regular expressions' when doing this search.)

2

There are 2 answers

1
Mud On BEST ANSWER

There is no one "regular expression" language. While there are broad commonalities, details differ from implementation to implementation. Many regexes use - to be the non-greedy "0 or more", others use *?. Apparently Microsoft Expression Web uses @.

In short, regexes can differ, so you'll often need to RTM for the one you're using to find its range of capabilities and detailed syntax (i.e. support for alteration/backtracking/etc., grouping character, set shorthand, etc.)

0
dognose On

.*? is the badest, so to say "antipattern" for Regular Expressions. It is commonly used as a "Match-something-until-the-string-i-want" Pattern - but it isn't.

Especially when combining multiple .*? within ONE pattern, it may lead to very wrong and unexpected results.

For your Case - as stated in the comments - It works. (Maybe you did something wrong?)

However, it is ALWAYS a good idea to be more specific, when generating a regex pattern. ALWAYS KEEP IN MIND that .*? can be ANYTHING. Also Stuff you really don't want to match!

In your example, i would use something like this: border-(?:[^:]+):\s*(?:[^\s]+)\s+(?:\#[a-fA-F0-9]{6})\s+(?:\d*(?:\.\d+)?)pt;?

It is more specific, but matches the given Requirements, ignores all whitespaces that dont make sence, and even matches border widths, regardles if they are written as .2, 3 or 4.1. If you remove the ?: from the single match Groups you can also match every single attribute, if required. : Position, Border type, Color and thickness.

The pattern border-([^:]+):\s*([^\s]+)\s+(\#[a-fA-F0-9]{6})\s+(\d*(?:\.\d+)?)pt;? with your string border-top:solid #1F497D .5pt;border-bottom:solid #1F497D .5pt will match:

First Match:

1.top
2.solid
3.#1F497D
4..5

Second Match:

1.bottom
2.solid
3.#1F497D
4..5