How can I include the caret character ^ in an Apache Commons CharSet?

193 views Asked by At

The Apache Commons CharSet class has its own syntax for characters to be included in the set. In that syntax the caret character (^) has special meaning (negation) but there isn't any documentation about how to include the caret in the set itself without negating characters that come after it. For example, the following unexpectedly returns false:

CharSet.getInstance("!@#$%^&").contains('&');

Interestingly, the following returns true, which I think is a bug (although the JavaDoc doesn't specify how it should behave):

CharSet.getInstance("!@#$%^&").contains('^')

Update: see comment by @Duncan below for explanation of this behavior)

My question is, how can I specify a CharSet that includes ^ as a character without affecting other characters in the set?

3

There are 3 answers

1
jakub.petr On BEST ANSWER

The getInstance method looks like this:

 public static CharSet getInstance(String... setStrs) 

According to the javadoc it is completely valid (and it is kind of documented) to do this:

CharSet.getInstance("^", "!@#$%&").contains('^')
0
Reimeus On

You could use the caret as the last character

CharSet.getInstance("!@#$%&^").contains('&')

returns true.

0
Duncan Jones On

Interestingly, the following returns true, which I think is a bug (although the JavaDoc doesn't specify how it should behave):

CharSet.getInstance("!@#$%^&").contains('^')

This behaviour seems odd until you consider what your supplied character set means. Your set contains the characters from the set [!@#$%] plus anything that's not the character &.

Consequently, ^ is in the set since it's not the character &.

In case you're interested, CharSet.getInstance("&^&").contains('&') returns true, since matching is done left-to-right on the original supplied char ranges. This is not clear in the docs, however.