Parse CSV with extra columns which are not defined in POJO using Jackson

1.7k views Asked by At

I am trying to parse a CSV and serialize it using the Jackson library. I have tried several methods but can't get to ignore the extra columns in the CSV which are not defined in the POJO. Requirements:

  1. The columns in the incoming CSV can be in any order.
  2. There can be some columns which are defined in POJO but not there in CSV (missing columns).
  3. There can be some columns in the CSV which are not defined in POJO (extra columns).

I have already tried @JsonIgnoreProperties(true) and also tried to use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES but nothing seems to work properly.

POJO:

public class student{
   @JsonProperty("STUDENT_NAME")
   private String name;

   @JsonProperty("DOB")
   private String dateOfBirth;

   @JsonProperty("ID")
   private String id;
}

CSV:

STUDENT_NAME,ID,STANDARD,DOB
John,1,4,01/02/2000
Doe,2,5,02/01/1999
1

There are 1 answers

3
KevinO On

I believe the issue is that you need to specify a specific Schema.

For the Student class defined thusly:

@JsonIgnoreProperties(ignoreUnknown = true)
public static class Student
{
    @JsonProperty("STUDENT_NAME")
    private String name;
        
    @JsonProperty("DOB")
    private String dateOfBirth;
        
    @JsonProperty("ID")
    private String id;

    ...
}

This approach appears to work (tested using a String input, but obviously can change to read a file):

public static List<Student> parse(String inp) throws IOException
{
    CsvSchema schema = CsvSchema.builder()
            .addColumn("STUDENT_NAME")
            .addColumn("ID")
            .addColumn("DOB")
            .setReorderColumns(true)
            .setUseHeader(true)
            .build();

            CsvMapper mapper = new CsvMapper();
            MappingIterator<Student> iter = mapper
                    .readerFor(Student.class)
                    .with(schema)
                    .readValues(inp);
                
            List<Student> students = new ArrayList<>();
            while (iter.hasNext()) {
                students.add(iter.next());
            }
                
            return students;
}

Tested via:

final String example = "STUDENT_NAME,ID,STANDARD,DOB\n"
        + "John,1,4,01/02/2000\n"
        + "Doe,2,5,02/01/1999";


@org.junit.Test
public void testViaInp() throws IOException
{
    List<Student> students = ParseCsvWithJackson.parse(example);
    Assert.assertNotNull("null students", students);
    Assert.assertEquals("Wrong # of students", 2, students.size());
        
    Student first = students.get(0);
    Assert.assertEquals("Wrong name", "John", first.getName());
    Assert.assertEquals("Wrong id", "1", first.getId());
        
    System.out.println(first);
}

Update: I added a test to use a csv file, and it passed the same tests with the one caveat that I needed to set skip empty lines as a blank line at the end of the file was creating an entry with null values.

    private static CsvSchema getSchema()
    {
        return CsvSchema.builder().addColumn("STUDENT_NAME").addColumn("ID")
                .addColumn("DOB").setReorderColumns(true).setUseHeader(true)
                .build();

    }

    private static CsvMapper getMapper()
    {
        CsvMapper mapper = new CsvMapper();
        
        mapper.enable(CsvParser.Feature.SKIP_EMPTY_LINES);
        
        return mapper;
    }



    public static List<Student> parse(Path csvfile) throws IOException
    {
        CsvSchema schema = getSchema();
        
        CsvMapper mapper = getMapper();
        MappingIterator<Student> iter = mapper.readerFor(Student.class)
                .with(schema).readValues(csvfile.toFile());

        List<Student> students = new ArrayList<>();
        while (iter.hasNext()) {
            students.add(iter.next());
        }

        return students;
    }