I have been trying to resolve this for the past 2 days...
Please help me in understanding why this is happening. My intention is to just select the <HDR>
that has a <DTL1 val="92">.....</HDR>
This is my regular expression
(?<=<HDR>).*?<DTL1\sval="3".*?</HDR>
And the input string is:
<HDR>abc<DTL1 val="1"><DTL2 val="2"></HDR><HDR><DTL1 val="92"><DTL2 val="55"></HDR><HDR><DTL1 val="3"><DTL2 val="4"></HDR>
But this regular expression selects
abc<DTL1 val="1"><DTL2 val="2"></HDR><HDR><DTL1 val="92"><DTL2 val="55"></HDR>
Can anyone please help me?
Casimir et Hippolyte already gave you a couple of good solutions. I want to elaborate on a few things.
First, why your regex fails to do what you want:
(?<=<HDR>).*?
tells it to match any number of characters starting with the first character preceded by<HDR>
, until it encounters what follows the non-greedy quantifier (<DTL1...
). Well, the first character that's preceded by<HDR>
is the firsta
, so it matches everything starting from there until the fixed string<DTL1\sval="3"
is encountered.Casimir et Hippolyte's solutions are for the generalized case, where the contents of the <HDR> tags can be anything other than nested <HDR>'s. You could also do it with a positive look-ahead:
However, if the string is guaranteed to be in the structure shown, where the <HDR> tags only contain one or more <DTL1 val="##"> tags, so you know there won't be any closing tags within, you could do it more efficiently by replacing the first
.*?
with[^/]*
:A negated character class is more efficient than a zero-width assertion, and if you're using a negated character class, a greedy quantifier becomes more efficient than a lazy one.
Note also that by using a lookbehind to match the opening <HDR>, you're excluding it from the match, but you're including the closing </HDR>. Are you sure that's what you want? You're matching this...
...when presumably you want this...
...or this...
So, in the fist case, don't use a lookbehind for the opening tag:
In the second case, use a look-ahead for the closing tag: