How can I rewrite my anchor to be more general and correct in all situations? I have understood that using \b as an anchor is not optimal because it is implementation-dependent.
My goal is to match some type of word in a text file. For my question, the word to match is not of importance.
Assume \b is the word boundary anchor and a word character is [a-zA-Z0-9_]
I constructed two anchors, one for the left and one for the right side of the regex. Notice how I handle the underscore, as I don't want it to be a word character when I read my text file.
(?<=\b|_)positive lookbehind(?=\b|_)positive lookahead
What would be the equivalent anchor constructs but using the more general caret ^ and $ dollar sign to get the same effect?
[The OP did not specify which regex language they are using. This answer uses Perl's regex language, but the final solution should be easy to translate into other languages. Also, I use whitespace as if the
xflag was provided, but that is also easily adjusted.]With the help of a comment made by the OP, the following is my understanding of the question:
You can use the following:
An explanation follows.
\bis equivalent to(?: (?<!\w)(?=\w) | (?<=\w)(?!\w) ).\b \w+ \bis therefore equivalent to(?<!\w) \w+ (?!\w)(after simplification).So now we just need a pattern that matches everything
\wmatches but_. There are a few approaches that can be taken.(?[ \w - [_] ])(?!_)\w\w(?<!_)[^\W_]Even though it's the least readable, I'm going to use the last one since it's the best supported.
We now have