I have the following little Python script:
import re
def main ():
thename = "DAVID M. D.D.S."
theregex = re.compile(r"\bD\.D\.S\.\b")
if re.search(theregex, thename):
print ("you did it")
main ()
It's not matching. But if I adjust the regex just slightly and remove the last . it does work, like this:
\bD\.D\.S\b
I feel I'm pretty good at understanding regexes, but this has be baffled. My understanding of \b (word boundary) should be the a zero width match of non alpha-numeric (and underscore). So I would expect
"\bD\.D\.S\.\b"
to match:
D.D.S.
What am I missing?
This doesn't do what you might think it does.
Here is an explanation of that regex, with the same examples that are listed below:
Word boundaries are zero-width matchers between word characters (
\w
, which is[0-9A-Za-z_]
plus other "letters" as defined by your locale) and non-word characters (\W
, which is the inversion of the previous class). Dot (.
) is not a word character, soD.D.S.
(note trailing whitespace) has word boundaries (only!) in the following places:\bD\b.\bD\b.\bS\b.
(I didn't escape the dots because I'm illustrating the word boundaries, not making a regular expression).I assume you are trying to match a end of line or whitespace. There are two ways to do that:
I've refined the above regex explanation link to explain the negation example above (note the first ends in
…/1
while the second ends in…/2
; feel free to further experiment there, it is nice and interactive).