I am having a problem with a program that creates a word to frequency map for a given document in Java. When I print all the words out I still see " " as a 'word'.
Here is the paraphrased code:
String delimiters = "[^a-zA-Z0-9]+";
String[] words;
SortedSet<String> allWords = new TreeSet<String>();
Map<String, Map<String, Integer>> wordMap = new HashMap<String, Map<String, Integer>>();
while ((line = bufferedReader.readLine()) != null) {
words = line.split(delimiters);
for all words add the word to the allWords set and the wordMap
}
for (String word : allWords) {
System.out.println(word + " : " + wordMap.get(word).entrySet());
}
Here is some sample output:
Time elapsed: 0.75 seconds.
: [books/dickens.txt=7] // WHAT ARE YOU?!?! How does this happen??!?!
10 : [books/dickens.txt=2]
11th : [books/dickens.txt=2]
12th : [books/dickens.txt=2]
How is this whitespace showing up? Thanks
ps if you want to see the full code here is a link
That is not a white space is an empty string. This happens when you have empty lines inside the file.
doing something like this
results in an array having one element and that element is an empty string.