Reading ctrl a delimiter in scalding

723 views Asked by At

I'm trying to read a ctrl-a delimited file in scalding. I'm getting an error that says it found the wrong number of fields (expecting 166, found 142) and then it displays the line it is trying to read. For some reason, it does not read the delimiter in the 1st third of the file. Here is the code I am using:

Csv(args("input"), separator = "\u0001", fields = schema)
    .read
    .groupBy('var2){group => group.sum[Long]('var3)}
    .write(Tsv(args("output")))

I'm new to scalding so maybe I am using the CSV function incorrectly/inappropriately. Any ideas on whhy that might be happening?

1

There are 1 answers

1
technotring On

I would suggest taking a looking the line at which it errors and see if there is any control character embedded in that field values. I had a quick run at reading a file delimited by this (start-of-heading!!)control character and I am able to read fine. So suggest taking a look at the data - if possible you can provide sample data.