python 3 regex string matching ignore whitespace and string.punctuation

1.9k views Asked by At

I am new to regex and would like to know how to pattern match two strings. The use case would be something like finding a certain phrase in some text. I'm using python 3.7 if that makes a difference.

phrase = "some phrase" #the phrase I'm searching for

Possible matches:

text = "some#@$#phrase"
            ^^^^ #non-alphanumeric can be treated like a single space
text = "some   phrase"
text = "!!!some!!! phrase!!!"

These are not matches:

text = "some phrases"
                   ^ #the 's' on the end makes it false
text = "ssome phrase"
text = "some other phrase"

I have tried using something like:

re.search(r'\b'+phrase+'\b', text)

I would very much appreciate an explanation of why the regex works if you provide a valid solution.

1

There are 1 answers

4
Juan Ignacio Sánchez On BEST ANSWER

You should use something like this:

re.search(r'\bsome\W+phrase\b', text)
  • '\W' means non-word character

  • '+' means one or more times

In case you have a given phrase in a variable, you could try this before:

some_phrase = some_phrase.replace(r' ', r'\W+')