In my Scalding map reduce code, I want to log out certain steps that are happening so that I can debug the map-reduce jobs if something goes wrong.
How can I add logging to my scalding job?
E.g.
import com.twitter.scalding._
class WordCountJob(args: Args) extends Job(args) {
//LOG: Starting job at time blah..
TextLine( args("input") )
.read
.flatMap('line -> 'word) {
line: String =>
line.trim.toLowerCase.split("\\W+")
}
.groupBy('word) { group => group.size('count) }
}
.write(Tsv(args("output")))
//LOG - ending job at time...
}
Any logging framework will do. You can obviously also use println() - it will appear in your job's stdout log file in the job history of your hadoop cluster (in hdfs mode) or in your console (in local mode).
Also consider defining a trap with the addTrap() method for catching erroneous records.