Extracting Strings from CSV file using Java CSVReader messes up German Umlaute (ö,ä,ü,...)

175 views Asked by At

I'm extracting German addresses from a CSV file using Java CsvReader. Some of the street names have special German characters, also called "Umlaute", like ö,ä,ü,... (Example: Sonnige Höhe). Here is the code I use:

try {
    String addressDataCsvFilename = "Tannis_Export.csv";
    CsvReader addressDataCsvFile = new CsvReader(addressDataCsvFilename, ',', Charset.forName("UTF-8") );
/*
    String[] headers = {
            "PLZ",         // C
            "Strasse",     // E
            "Hausnummer",  // F
        };
 */

    // get headers
    addressDataCsvFile.readHeaders();
    while (addressDataCsvFile.readRecord()) {
        // workaround for issue with CSVReader not finding header in first column
        // String partNumber    = priceListCsvFile.get("PART NUMBER");
        String postleitzahl  = addressDataCsvFile.get("PLZ");
        String strassenName  = addressDataCsvFile.get("Strasse");
        String hausNummer    = addressDataCsvFile.get("Hausnummer");

It turns out that even though I'm specifying UTF-8 as charset, CsvReader.readRecord() doesn't read the special German characters correctly, so "Sonnige Höhe" becomes "Sonnige H�he". How to prevent that?

1

There are 1 answers

1
Robert Bethge On

If I change the charset from UTF-8 to ISO-8859-1, it works. Here is the modified line:

// DOESN'T WORK: CsvReader  addressDataCsvFile      =   new CsvReader(addressDataCsvFilename, ',', Charset.forName("UTF-8") );
CsvReader   addressDataCsvFile      =   new CsvReader(addressDataCsvFilename, ',', Charset.forName("ISO-8859-1") );