Convert flattened CSV to nested JSON

1.4k views Asked by At

I want to create a nested JSON from a flattened CSV:

CSV:

name address_city address_state
John Mumbai MH
John Bangalore KA
Bill Chennai TN

JSON:

[
 {
  "name": "John",
  "address": [
              {
               "city": "Mumbai",
               "state": "MH"
              },
              {
               "city": "Bangalore",
               "state": "KA"
             }
            ]
 },
 {
  "name": "Bill",
  "address": [
              {
               "city": "Chennai",
               "state": "TN"
              }
            ]
 }
]

I'm using univocity parser with @Nested annotation like this:

@Nested(headerTransformer = AddressTypeTransformer.class, args = "address")
private Address address;

and I'm getting JSON output as below, which has the address object and not array which is perfectly fine:

[
 {
  "name": "John",
  "address": {
               "city": "Mumbai",
               "state": "MH"
              }
 },
 {
  "name": "John",
  "address": {
               "city": "Mumbai",
               "state": "MH"
             }
 },
 {
  "name": "Bill",
  "address": {
               "city": "Chennai",
               "state": "TN"
              }
 }
]

But when i change the code to make the address as array:

@Nested(headerTransformer = AddressTypeTransformer.class, args = "address")
private Address[] address;

I get following error:

Exception in thread "main" com.univocity.parsers.common.DataProcessingException: Unable to instantiate class '[Lcom.ss.beans.Address;'
Internal state when error was thrown: line=2, column=0, record=1, charIndex=58, headers=[id, name, address_city, address_state],

Why the @Nested annotation is not working with arrays/lists? How can I solve this problem? Is there any other way to solve this problem without using univocity?

PS: I'm asking this question after following the reply from @Jeronimo Backes in this post: Convert CSV data into nested json objects using java library

1

There are 1 answers

0
andrewJames On

Here is my approach:

The test data (in my case, the fields are tab-separated):

name    address_city    address_state
John    Mumbai  MH
John    Bangalore   KA
Bill    Chennai TN

The imports I used:

import com.google.gson.Gson;
import com.univocity.parsers.common.processor.BeanListProcessor;
import com.univocity.parsers.csv.CsvParser;
import com.univocity.parsers.csv.CsvParserSettings;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.nio.charset.StandardCharsets;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

The processing code:

//
// parse the source file into a list of SourceRecord beans:
//
Reader reader = new FileReader(new File("C:/tmp/univocity_demo.csv"), StandardCharsets.UTF_8);
BeanListProcessor<SourceRecord> processor = new BeanListProcessor<>(SourceRecord.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setDelimiter("\t"); // tab separated data
parserSettings.getFormat().setLineSeparator("\n");
parserSettings.setProcessor(processor);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(reader);
List<SourceRecord> sourceRecords = processor.getBeans();

//
// process those SourceRecord objects into consolidated Name beans:
//
Map<String, Name> namesMap = new HashMap<>();
sourceRecords.forEach(sourceRecord -> {
    String sourceName = sourceRecord.getName();
    if (namesMap.containsKey(sourceName)) {
        namesMap.get(sourceName).getAddresses().add(sourceRecord.getAddress());
    } else {
        Name name = new Name();
        name.setName(sourceName);
        name.getAddresses().add(sourceRecord.getAddress());
        namesMap.put(sourceName, name);
    }
});

//
// convert to JSON:
///
Gson gson = new Gson();
String json = gson.toJson(namesMap.values());

The SourceRecord bean is as follows. Note that we do not need anything other than the basic @Nested annotation, here:

public class SourceRecord {
    
    @Parsed(field = "name")
    private String name;
    
    @Nested
    private Address address;

    // getters/setters not shown

}

Here are the output Name and Address beans. Note I am using the field name addresses (not address) in the Name bean:

public class Name {
    
    private String name;
    
    private final List<Address> addresses = new ArrayList<>();

    // getters/setters not shown
}

And the Address bean - this is used both for the final output and also when reading the source file (hence the annotations are needed):

public class Address {
    
    @Parsed(field = "address_city")
    private String city;
    
    @Parsed(field = "address_state")
    private String state;

    // getters/setters not shown
    
}

The final JSON is:

[{
    "name": "John",
    "addresses": [{
        "city": "Mumbai",
        "state": "MH"
    }, {
        "city": "Bangalore",
        "state": "KA"
    }]
}, {
    "name": "Bill",
    "addresses": [{
        "city": "Chennai",
        "state": "TN"
    }]
}]