Java String.split with "[^a-zA-Z0-9]+" still showing whitespace as a word

1.6k views Asked by At

I am having a problem with a program that creates a word to frequency map for a given document in Java. When I print all the words out I still see " " as a 'word'.

Here is the paraphrased code:

String delimiters = "[^a-zA-Z0-9]+";
String[] words;
SortedSet<String> allWords = new TreeSet<String>();
Map<String, Map<String, Integer>> wordMap = new HashMap<String, Map<String, Integer>>();

while ((line = bufferedReader.readLine()) != null) {
    words = line.split(delimiters);
    for all words add the word to the allWords set and the wordMap
}

for (String word : allWords) {
    System.out.println(word + " : " + wordMap.get(word).entrySet());
}

Here is some sample output:

Time elapsed: 0.75 seconds.
 : [books/dickens.txt=7]       // WHAT ARE YOU?!?! How does this happen??!?!
10 : [books/dickens.txt=2]
11th : [books/dickens.txt=2]
12th : [books/dickens.txt=2]

How is this whitespace showing up? Thanks

ps if you want to see the full code here is a link

1

There are 1 answers

9
Claudiu On BEST ANSWER

That is not a white space is an empty string. This happens when you have empty lines inside the file.

doing something like this

words = "".split(delimiters);

results in an array having one element and that element is an empty string.