I have a scalding job that looks like this:
import com.twitter.scalding.{Args, Csv, Job, TextLine}
class DataJob(args: Args) extends Job(args) {
val input = args("input")
val output = Csv(args("output"), separator = ",")
def parseLine(x: String):Seq[(String, String, String, String)] = {
List(("a", "b", "c", "d")) //Returns a list, not a tuple
}
TextLine(input).mapTo('line -> ('v1, 'v2, 'v3, 'v4)) {
x:String => {
parseLine(x) // this code fails with arity error
}
}.write(Csv(args("output")))
}
When it runs, I get the following error:
Caused by: java.lang.AssertionError: assertion failed: Arity of (class com.twitter.scalding.LowPriorityTupleSetters$$anon$2) is 1, which doesn't match: + ('v1', 'v2', 'v3', 'v4')
This is because my parseLine
function returns a list of tuples but the code expects a single tuple to be emitted. How can I get this code to work?
Ok, looks like I just needed to change:
TextLine(input).mapTo('line -> ('v1, 'v2, 'v3, 'v4))
to:
TextLine(input).flatMap('line -> ('v1, 'v2, 'v3, 'v4))
Still not exactly clear why, so any responses would be appreciated!