I am writing a Scalding job to transform data in the following format:
Id,Name,Param1,Val1,Param2,Val2....ParamxValx
1,Cat,Hair,White,Eye,Blue...
Into:
Id,Name,Param,Val
1,Cat,Hair,White
1,Cat,Eye,Blue
My problem is that I don't know how many Param/Val items might be present in a given line. I do know that they are CSV though. How can I write a Scalding/MR job to transform my data?
Everything I have read recommends doing the following:
Csv("data.csv", "," , ('productID,'price,'quantity)).read
but in that case I would need to know the "schema" of my CSV file, which I don't, given that there may be arbitrarily many Param/Val entries per line.