I have a string
1234X5678
and I use this regex to match pattern
.X|..X|X.
I got
34X
The question is why didn't I get 4X or X5?
Why regex choose to perform the second pattern?
I have a string
1234X5678
and I use this regex to match pattern
.X|..X|X.
I got
34X
The question is why didn't I get 4X or X5?
Why regex choose to perform the second pattern?
The main point here is:
Regex engine analyzes the input from LEFT TO RIGHT by default.
So, you have an alternation pattern
.X|..X|X.and you run it against1234X5678. See what happens:Each alternative branch is tested against each location in the string from left to right.
The first 1-7 steps show how the engine tries to match the characters at the beginning of the string. However, none of the branches (neither
.X, nor..X, norX.match12or123).Steps 8-13 just repeat the same failing scenario as none of the branches match
23or234.Steps 14-19 show a success scenario because the
34Xcan be matched with Branch 2 (..X).The regex engine does not reach the location before
4since this location gets matched and consumed.And another conclusion:
The order of alternations matters, and in NFA regex engines the first alternative matched wins, BUT this alternative does not have to be the first shortest one, a farther longer alternative that matches the same characters in the beginning can match earlier.