I thought I knew about regex... Here's the situation:
N-U0 U0-M1
M1-T9 T9-R10 R10-E19
E19-L100 L100-B
I have a String that contains groups (let's call them transitions) separated by whitespace (may or may not be line breaks, I'm treating both equally; also, may be one or more characters). Each group is composed of two groups (let's call them exiting and entering) separated by a hyphen. Each of these is composed of either a single character (N or B, respectively) or a specific character and a one-or-many-digits number.
I want to run a regex match that will give me one object for each transition and then, for each object, I want access to each part of the transition by means of named capture groups.
These are the regexes I've written:
static RegExp regex = RegExp(
r'(?<exitingN>N)|((?<exitingF>[UMTREL]{1})(?<exitingNumber>[0-9]+))-(?<enteringB>B)|((?<enteringF>[UMTREL]{1})(?<enteringNumber>[0-9]+))\s+',
);
static RegExp exitingRegex = RegExp(
r'(?<exitingN>N)|((?<exitingF>[UMTREL]{1})(?<exitingNumber>[0-9]+))-',
);
static RegExp enteringRegex = RegExp(
r'-(?<enteringB>B)|((?<enteringF>[UMTREL]{1})(?<enteringNumber>[0-9]+))',
);
When I run
final matchList = regex.allMatches(
"N-U0 U0-M1\nM1-T9 T9-R10 R10-E19\nE19-L100 L100-B\n",
);
It doesn't work as I expect it to. It matches the first N, then the first U0, then the first M1, and so on until the first L100 and the B. I was expecting it to match N-U0, then U0-M1 and so on. At least matchList.elementAt(0).namedGroup("exitingN") etc works, but I wanted the exiting and the entering parts together.
I tried to add the regex inside another group and I tried both with and without ?: (to make it non-capturing), plus a few other tests, I think, but nothing worked.
Then I tested with exitingRegex only and it worked as expected, matching every exiting. However, enteringRegex didn't work. It matched every exiting and every entering except for N.
The only way I managed to make it work was to match with exitingRegex and then, for the entering, I had to first use "N-U0 U0-M1\nM1-T9 T9-R10 R10-E19\nE19-L100 L100-B\n".replaceAll(exitingRegex, "",) and then match with enteringRegex but without the leading hyphen. This way, I got the exiting and the entering separately, which I have to join later by index.
What's going on?
Thanks in advance.
To limit the branches separated by
|, wrap them in a group. This group can be a capturing (()) or non-capturing group ((?:)), depends on what you need. That said, your regex should look like this:For an input of
U0-M1, this regex matches and returns the following groups:U0-M1U0exitingF:UexitingNumber:0Do note that I removed those unnecessary
{1}because an expression always match 1 instance of itself by default.Try it on regex101.com.