JAVA REGEX :: Could you explain this?

102 views Asked by At

My pattern is [a-z][\\*\\+\\-_\\.\\,\\|\\s]?\\b

My Result:

a__
not matched
a_.
pattern matched = a_
a._
pattern matched = a.
a..
pattern matched = a

why my first input is alone not matched??? Thanks in advance.

[ PS: got the same result with [a-z][\\*\\+\\-\\_\\.\\,\\|\\s]?\\b ]

2

There are 2 answers

2
Amadan On BEST ANSWER

Because unlike the period ., the underscore _ is considered to be a word character; so a_ is one word, but a. is a word with interpunction.

So, a__ matches a, then matches _, then fails to match a word boundary (since the next _ is a part of the same word).

a.. matches a, skips the character range, then matches the word boundary between the word a and the interpunction ..

2
fge On

With the regex rewritten in a "proper way", that is:

"[a-z][*+\\-_.,|\\s]?\\b"

Or, in an "unquoted", canonical way:

[a-z][*+\-_.,|\s]?\b

that your first input does not match is expected; a character class will only ever match one character. After it matches the first underscore, it looks for a word boundary, but cannot find one: for the Java regex engine, _ is a character which can be part of a word. Hence the result.