Buffered reader - remove punctuation

878 views Asked by At

I need help with reader, which will remove punctuation and numbers and will create array of strings out of the input.

For example, on the input, there will be "example.txt" file which will contain something like this:

Hello 123 , I'am new example ... text file!"

I need my reader to create array which will contain this:

String[] example = {"Hello", "I", "am", "new", "example", "text", "file"}

Is there a way how to remove punctuation and numbers and create array of strings with buffered reader?

Thank you in advance, Fipkus.

3

There are 3 answers

0
Filip Kraus On BEST ANSWER

In the end, I fixed it like this:

char[] alphabet= {'a','á','b','c','č','d','ď','e','é','ě','f','g','h',
            'i','í','j','k','l','m','n','ň','o','ó','p','q','r','ř','s','š','t','ť',
            'u','ú','ů','v','w','x','y','ý','z','ž','A','Á','B','C','Č','D','Ď','E','É',
            'Ě','F','G','H','I','Í','J','K','L','M','N','Ň','O','Ó','P','Q','R','Ř','S','Š','T',
            'Ť','U','Ú','Ů','V','W','X','Y','Ý','Z','Ž',' '};



                String vlozena = userInputScanner.nextLine();
                String fileContentsSingle = "";
                Integer lenght = vlozena.length();
                int j ;
                char cha;

                        /*
                         * kontroluje, zda se jedná o mezeru či písmeno české abecedy
                         * a poté jej přidá, pokud vyhovuje, do věty
                         */
                for (j = 0; j<lenght;j++) {
                    cha = vlozena.charAt(j);
                    for (char z : abeceda) {
                        if (cha == z) {
                            fileContentsSingle = fileContentsSingle + cha;
                        }
                    }
                }

                fileContentsSingle = fileContentsSingle.replaceAll("\\s+", " ");
                fileContentsSingle = fileContentsSingle.toLowerCase();
                String[] vetaNaArraySingle = fileContentsSingle.split("\\s+",-1);
2
Ben Minton On

Another method is using StringTokenizer. It's a little more restrictive, but I prefer it since you just list the delimiters instead of regex, which is a little easier to read.

String test = "Hello 123 , I'am new example ... text file!";
ArrayList<String> exampleTemp = new ArrayList<>();
String[] example = new String[6];

StringTokenizer st = new StringTokenizer(test, " ,.1234567890!");
while(st.hasMoreTokens()) 
{
    exampleTemp.add(st.nextToken());
} 
exampleTemp.toArray(example);

for(String word : example)
{
    System.out.println(word);
}

Edit: I modified it to fill a String array. Not sure about the white space issue.

1
brlaranjeira On

Use String.split(regex). In String regex, you put the characters you have to remove like in String regex = ",0123456789\\.".