Match beginning of the line only if pattern is found anywhere after the first part of the positive lookahead

36 views Asked by At

I'm having a hard time with vscode's oniguruma regex parsing for TextMate. Apparently you can't use a newline inside a lookahead, even though oniguruma actually supports it, it's probably not enabled in vscode's version of oniguruma.

I need to match the beginning of a string if, and only if, after element there is desiredAttr1="desiredValue1" or desiredAttr2="desiredValue2":

<element attribute="value" desiredAttr1="desiredValue1" desiredAttr2="desiredValue2">

So far so good, but the thing is, these attributes can be in any order, and there can be a newline in between them. Eg.:

<!-- Should match -->
<element
   attribute="value"
   desiredAttr1="desiredValue1"
   desiredAttr2="desiredValue2"
>

<!-- Should match -->
<element
   attribute="value"
   desiredAttr2="desiredValue2"
>

<!-- Should match -->
<element attribute="value" desiredAttr1="desiredValue1">

<!-- Should match -->
<element desiredAttr2="desiredValue2" attribute="value">

<!-- Should NOT match -->
<element
   attribute="value"
   notDesiredAttr1="desiredValue1"
   notDesiredAttr2="desiredValue2"
>

This is what I got so far (and it works on rubular):

/(^[\t]+)?(?=<(?i:element)\b(?!-)[\s\w\W]*(?:((desiredAttr1="desiredValue1")|(desiredAttr2="desiredAttr2"))))/

Note: I tried also replacing \s with [:space:] and [^/]

This is what I need to match:

<span style="background: red;">&nbsp;</span><code>&#60;element<br/>
&nbsp;&nbsp;attribute="value"<br/>
&nbsp;&nbsp;desiredAttr1="desiredValue1"<br/>
&nbsp;&nbsp;desiredAttr2="desiredValue2"<br/>
&#62;</code>

Is there any other alternative I could use? Thanks in advance.

1

There are 1 answers

9
The fourth bird On

Assuming there are no angle brackets in between, you could use:

^[\p{Zs}\t]*(?=<element\b[^<>\r]*\bdesiredAttr([12])="desiredValue\1"[^<>\r]*>)

The pattern matches:

  • ^ Start of string
  • [\p{Zs}\t]* Match optional spaces or tabs
  • (?= Positive lookahead
    • <element\b Match element followed by a word boundary
    • [^<>\r]* Optionally repeat matching any char except < > or \r
    • \bdesiredAttr([12])= match desiredAttr and capture either 1 or 2 in group 1
    • "desiredValue\1" match "desiredValue\1" where \1 is a backreference to the captured digit in group 1 (to match the same digit)
    • [^<>\r]* Optionally repeat matching any char except < > or \r
    • > Match literally
  • ) Close the lookahead

See a regex demo.

enter image description here