Java String delete tokens contains numbers

2.2k views Asked by At

I have a string like this and I would like to eliminate all the tokens that contain a number:

 String[] s="In the 1980s".split(" ");

Is there a way to remove the tokens that contain numbers - in this case 1980s, but also, for example 784th or s787?

4

There are 4 answers

2
AudioBubble On BEST ANSWER

Use a \w*\d\w* regex matcher for that. It will match all words with at least one digit in them. Although I generally despise regexes, they are particularily well suited for your problem.

String[] s = input.replaceAll("\\w*\\d\\w* *", "").split(" +");

See Java lib docs for Pattern/Matcher (RegEx) for more reference how to work with regexes in general.

Test code: http://ideone.com/LrHDsT

2
Bohemian On

Remove the unwanted words first, then split:

String[] s = str.replaceAll("\\w*\\d\\w*", "").trim().split(" +");

Some test code:

String str = "666 In the 1980s 784th s787 foo BAR";
String[] s = str.replaceAll("\\w*\\d\\w*", "").trim().split(" +");
System.out.println(Arrays.toString(s));

Output:

[In, the, foo, BAR]
0
Anirudh On

You could Regex as suggested by @vaxquis or alternately after splitting the string based on the delimiter

You could Parse the token strings and check if the token has number among them using NumberUtils.isNumber and remove those tokens.

6
Pshemo On

split doesn't seem to be what you are looking for. Even if you remove words which contain digit like in case of

"1foo f2oo bar whatever baz2"

you will end up with

"  bar whatever " 

and if you split on spaces now you will end up with ["", "bar", "whatever"].

To solve this problem you may want also to remove spaces after word you removed so now

"1foo f2oo bar whatever baz2"

would become

"bar whatever "

so it can be split correctly (space at the end is not the problem since split by default removes trailing empty strings in result array).


But instead of doing two iterations (removing words and splitting on string) you can achieve same thing with only one iteration. All you need to do is use opposite approach:instead of focusing on removing wrong elements, lets try to find correct ones.
Correct tokens seem to be words which contains any non-space characters but not digits. You can regex representing such words with this regex \b[\S&&\D]\b where:

  • \b represents word boundaries,
  • \S any non whitespace character
  • \D any non digit character
  • [\S&&\D] intersection of non-whitespaces and non-digits, in other words non whitespaces which are also non-ditigts

Demo:

String input = "1foo f2oo bar whatever baz2";
Pattern p = Pattern.compile("\\b[\\S&&\\D]+\\b");
Matcher m = p.matcher(input);
while(m.find())
    System.out.println(m.group());

Output:

bar
whatever

BTW to avoid potential problems with potential empty element at start of results you can use Scanner which doesn't return empty element if delimiter is found at start of string. So we can simply set delimiter as series of spaces or words which contains digit. So your code can also look like

Scanner sc = new Scanner(input);
sc.useDelimiter("(\\s|\\w*\\d\\w*)+");
while (sc.hasNext())
    System.out.println(sc.next());
sc.close();