How to process CSV files with unknown token length

124 views Asked by At

I am writing a Scalding job to transform data in the following format:

Id,Name,Param1,Val1,Param2,Val2....ParamxValx
1,Cat,Hair,White,Eye,Blue...

Into:

Id,Name,Param,Val
1,Cat,Hair,White
1,Cat,Eye,Blue

My problem is that I don't know how many Param/Val items might be present in a given line. I do know that they are CSV though. How can I write a Scalding/MR job to transform my data?

Everything I have read recommends doing the following:

Csv("data.csv", "," , ('productID,'price,'quantity)).read

but in that case I would need to know the "schema" of my CSV file, which I don't, given that there may be arbitrarily many Param/Val entries per line.

0

There are 0 answers