Removing All Non-Word Characters (Punctuation) From A String

325 views Asked by At

Okay, this is my first time posting so you'll have to excuse me if I make any mistakes. To make a long story short, I'm given an array of Strings, and my objective is to keep a count of the unique words of the string as well as remove any punctuation characters from the array.

public static HashMap<String, Integer> uniqueWords(String[] book) {
    HashMap<String, Integer> hm = new HashMap<>();

    for (int i = 0; i < book.length; i++) {
        if (hm.containsKey(book[i])) {
            hm.put(book[i], hm.get(book[i]) + 1);
        } else {
            book[i] = book[i].replaceAll("[^a-zA-Z]","").replaceAll("\\p{Punct}","").replaceAll("\\W+","").replaceAll("\\n","").toLowerCase();
            hm.put(book[i], 1);
        }
    }
    return hm;
}

Input: {"Redfish", "redfish", "redfish", "Bluefish", "bluefish", "bluefish", "*", "%", ""};

Output: {=2, bluefish=3, redfish=3}

So I've managed to successfully remove any white space but I'm still having the asterisk and the percentile being counted.

Any help is appreciated, thank you.

1

There are 1 answers

4
pizzaslice On

Try something like this --

    public static HashMap<String, Integer> uniqueWords(String[] book) {
    HashMap<String, Integer> hm = new HashMap<>();
string strBook = "";
int key = 1;
    for (int i = 0; i < book.length; i++) {
    strBook= book[i].replaceAll("[^a-zA-Z]","").replaceAll("\\p{Punct}","").replaceAll("\\W+","").replaceAll("\\n","").toLowerCase();
        if (!hm.containsKey(strBook)) {
            hm.put(key, strBook);
            key++;
        }
    }
    return hm;
}