Univocity CSV parser glues the whole line if it begins with quote "

1.1k views Asked by At

I'm using univocity 2.7.5 to parse csv file. Till now it worked fine and parsed a row in csv file as String array with n elements, where n = number of columns in a row. But now i have a file, where rows start with quote " and the parser cannot handle it. It returns a row as String array with only one element which contains whole row data. I tried to remove that quote from csv file and it worked fine, but there are about 500,000 rows. What should i do to make it work?

Here is the sample line from my file (it has quotes in source file too):

 "100926653937,Kasym Amina,620414400630,Marzhan Erbolova,""Kazakhstan, Almaty, 66, 3"",87029845662"

And here's my code:

    CsvParserSettings settings = new CsvParserSettings();
    settings.setDelimiterDetectionEnabled(true);
    CsvParser parser = new CsvParser(settings);
    List<String[]> rows = parser.parseAll(csvFile);
1

There are 1 answers

0
Jeronimo Backes On

Author of the library here. The input you have there is a well-formed CSV, with a single value consisting of:

100926653937,Kasym Amina,620414400630,Marzhan Erbolova,"Kazakhstan, Almaty, 66, 3",87029845662

If that row appeared in the middle of your input, I suppose your input has unescaped quotes (somewhere before you got to that line). Try playing with the unescaped quote handling setting:

For example, this might work:

settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);

If nothing works, and all your lines look like the one you posted, then you can parse the input twice (which is shitty and slow but will work):

CsvParser parser = new CsvParser(settings);
parser.beginParsing(csvFile);

List<String[]> out = new ArrayList<>();
String[] row;
while ((row = parser.parseNext()) != null) {
    //got a row with unexpected length?
    if(row.length == 1){
        //break it down again.
        row = parser.parseLine(row[0]);
    }
    out.add(row);
}

Hope this helps.