How do I log to file in Scalding?

521 views Asked by At

In my Scalding map reduce code, I want to log out certain steps that are happening so that I can debug the map-reduce jobs if something goes wrong.

How can I add logging to my scalding job?

E.g.

import com.twitter.scalding._
class WordCountJob(args: Args) extends Job(args) {
   //LOG: Starting job at time blah..
   TextLine( args("input") )
   .read
   .flatMap('line -> 'word) {
      line: String =>
      line.trim.toLowerCase.split("\\W+") 
   }
   .groupBy('word) { group => group.size('count) }
}
.write(Tsv(args("output")))
//LOG - ending job at time...
}
1

There are 1 answers

0
Erik Schmiegelow On

Any logging framework will do. You can obviously also use println() - it will appear in your job's stdout log file in the job history of your hadoop cluster (in hdfs mode) or in your console (in local mode).

Also consider defining a trap with the addTrap() method for catching erroneous records.