How to check and extract words from url

229 views Asked by At

Documentation for Go's built-in regex pkg is here: https://golang.org/pkg/regexp/ Regex tester in Go here: https://regoio.herokuapp.com

I have a list of predefined words:

christmas, santa, tree  ( -> the order here is important. Check for words from left to right)

I am trying to check for one of the above words in different url strings:

/api/container/:containerID/santa           ( -> I want back santa)
/api/tree/:containerID/                     ( -> I want back tree)
/api/tree/:containerID/christmas            ( -> I want back christmas, not tree)

The regex I have tried is is:

re := regexp.MustCompile(`^(christmas)|(santa)|(tree)$`)
      fmt.Println("santa? ", string(re.Find([]byte(`/api/container/:containerID/santa`))))
      // output OK: santa? santa
      fmt.Println("tree? ", string(re.Find([]byte(`/api/tree/:containerID/`))))  
      // output FAIL/EMPTY: tree? 
      fmt.Println("christmas? ", string(re.Find([]byte(`/api/tree/:containerID/christmas`))))  
      // output FAIL/EMPTY: christmas? 

Have also tried the following, but that gives back the hole string, and not the words I am looking for:

re := regexp.MustCompile(`^.*(christmas).*|.*(santa).*|.*(tree).*$`
      fmt.Println("santa? ", string(re.Find([]byte(`/api/container/:containerID/santa`))))
      // output FAIL/HOLE URL BACK: santa? /api/container/:containerID/santa
      fmt.Println("tree? ", string(re.Find([]byte(`/api/tree/:containerID/`))))  
      // output FAIL/FAIL/HOLE URL BACK: tree? /api/tree/:containerID/ 
      string(re.Find([]byte(`/api/tree/:containerID/christmas`))))  
      // output FAIL/FAIL/HOLE URL BACK: christmas? /api/tree/:containerID/christmas

I do not know what is wrong with the last expression for the regex "engine" should only remember the things inside the paranthesis.

1

There are 1 answers

3
Jonathan Hall On

Don't use a regular expression for this task. It's over-complex, hard to reason about (as you now know first hand), and slow. A much simpler approach is to simply loop over each path segment and look for a match:

needles := []string{"christmas", "santa", "tree"}
sampleURL := `/api/container/:containerID/santa`
for _, part := range strings.Split(sampleURL, "/") {
    for _, needle := range needles {
        if part == needle {
            fmt.Printf("found %s\n", needle)
        }
    }
}

If you have a lot of words you're searching for, efficiency may possibly be improved by using a map:

needles := []string{"christmas", "santa", "tree", "reindeer", "bells", "chior", /* and possibly hundreds more */ }
needleMap := make(map[string]struct{}, len(needles))
for _, needle := range needles {
    needleMap[needle] = struct{}{}
}

sampleURL := `/api/container/:containerID/santa`

for _, part := range strings.Split(sampleURL, "/") {
    if _, ok := needleMap[part]; ok {
        fmt.Printf("found %s\n", needle)
    }
}