On which hadoop node would the below scalding pre-process and post-process runs?

50 views Asked by At

I have the below example code for some preprocess before sclading job runs and some post-process. As these pre-process and post-process are calling some mysql database I would like to know on which hadoop nodes would hadoop potentially run them? (I need to open the port from these nodes to database) could it run the pre-process and post-process any hadoop data-node? I tried doing some research but could not find any indication, how is it possible to find by documentation / sources on which node it would be run? (PS the jobs are scheduled with oozie)

  preProcessingBeforeJobRuns() // **in which hadoop node would this be run? could it run on any datanode?**
  log.info(s"ABOUT TO RUN JOB with input $jobInput")
  val scaldingTool = new Tool
  scaldingTool.setJobConstructor(createJob(jobInput))
  val parser: GenericOptionsParser = new GenericOptionsParser(new Configuration(), args)
  scaldingTool.setConf(parser.getConfiguration)
  log.info(s"CALLING SCALDING RUN with args: ${args.toList.mkString(" ")}")
  val status = scaldingTool.run(args)
  log.info("FINISHED RUNNING JOB!")
  somePostJobProcessing() // **in which hadoop node would this be run? could it run on any datanode?**
1

There are 1 answers

0
Dan Osipov On BEST ANSWER

The code you've posted will run on the Hadoop master node. scaldingTool.run(args) will trigger your job, which would trigger the jobs that execute on task nodes.