what is wrong with my word boundary regex?

Question

what is wrong with my word boundary regex?

1.8k views Asked by sniperd At 18 September 2017 at 14:25

I have the following little Python script:

import re

def main ():
    thename = "DAVID M. D.D.S."
    theregex = re.compile(r"\bD\.D\.S\.\b")
    if re.search(theregex, thename):
        print ("you did it")
main ()

It's not matching. But if I adjust the regex just slightly and remove the last . it does work, like this:

\bD\.D\.S\b

I feel I'm pretty good at understanding regexes, but this has be baffled. My understanding of \b (word boundary) should be the a zero width match of non alpha-numeric (and underscore). So I would expect

"\bD\.D\.S\.\b"

to match:

D.D.S.

What am I missing?

Original Q&A

There are 2 answers

codeonly On 18 September 2017 at 15:18

\.\b matches .bla - checks for word character after .
\.\B the opposite matches bla. but not bla.bla - checks for non word after .

\bD\.D\.S\.\B

**Adam Katz** · Accepted Answer · 2017-09-18T14:34:41+00:00

This doesn't do what you might think it does.

r"\bD\.D\.S\.\b"

Here is an explanation of that regex, with the same examples that are listed below:

D.D.S.   # no match, as there is no word boundary after the final dot
D.D.S.S  # matches since there is a word boundary between `.` and `S` at the end

Word boundaries are zero-width matchers between word characters (\w, which is [0-9A-Za-z_] plus other "letters" as defined by your locale) and non-word characters (\W, which is the inversion of the previous class). Dot (.) is not a word character, so D.D.S. (note trailing whitespace) has word boundaries (only!) in the following places: \bD\b.\bD\b.\bS\b. (I didn't escape the dots because I'm illustrating the word boundaries, not making a regular expression).

I assume you are trying to match a end of line or whitespace. There are two ways to do that:

r"\bD\.D\.S\.(?!\S)"   # by negation: do not match a non-whitespace
r"\bD\.D\.S\.(?:\s|$)" # match either a whitespace character or end of line

I've refined the above regex explanation link to explain the negation example above (note the first ends in …/1 while the second ends in …/2; feel free to further experiment there, it is nice and interactive).

TechQA.

what is wrong with my word boundary regex?

There are 2 answers

Related Questions in REGEX

Related Questions in PYTHON-3.X

Related Questions in WORD-BOUNDARY

Popular Questions

Popular Tags

Trending Questions