PCRE to RE2 regex conversion with negative lookahead

9.5k views Asked by At

I have a pcre regex string and I am trying to convert to re2. Here is the pcre and an the string to match on.

\%(?!$|\W)

It matches only on the % and in case there is ! or non-word char doesn't

%252525253E%252553Csvg%25252525252525252Fonload%252525252525252525252525252525252525252525252525252525253Dalert(document.domain)%252525252

Result: % % % %

My best conversion is this:

\%[^!$|\W]

Result: %2 %3 %3 %2 %3 %3

This however matches on the first digit and I do not want that, I'd like it to behave exactly as the pcre version. This is where I test:

regex-golang DOT appspot DOT com/assets/html/index.html

regex101 DOT com

Any help will be appreciated.

1

There are 1 answers

3
G.Margaritis On BEST ANSWER

You could try something like this:

(\%)(?:[^!$|\W])

Since golang doesn't have negative lookahead (at least I think so) you could use a non-capturing group instead.So in this example you will need to use the first capturing group (e.g.matches[1] and not matches[0]) https://regex101.com/r/THTWwB/2

EDIT: A more detailed example in golang to help you understand the above regex is the following:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    r := regexp.MustCompile(`(\%)(?:[^!$|\W])`)
    m := r.FindAllStringSubmatch(`%252525253E%252553Csvg%25252525252525252Fonload%252525252525252525252525252525252525252525252525252525253Dalert(document.domain)%252525252`,-1)
    fmt.Printf("%#v\n",m )
}

In this example you can access your % by using the first capturing group.So for example m[0][0] will be %2 but m[0][1] will be just % (1st capturing group).Note that the first index is the index of the matches.So for the first match is stored in m[0][] , the second in m[1][] etc