I have the below example
code for some preprocess before sclading job runs and some post-process. As these pre-process and post-process are calling some mysql database I would like to know on which hadoop nodes would hadoop potentially run them? (I need to open the port from these nodes to database) could it run the pre-process and post-process any hadoop
data-node
? I tried doing some research but could not find any indication, how is it possible to find by documentation / sources on which node it would be run? (PS the jobs are scheduled with oozie)
preProcessingBeforeJobRuns() // **in which hadoop node would this be run? could it run on any datanode?**
log.info(s"ABOUT TO RUN JOB with input $jobInput")
val scaldingTool = new Tool
scaldingTool.setJobConstructor(createJob(jobInput))
val parser: GenericOptionsParser = new GenericOptionsParser(new Configuration(), args)
scaldingTool.setConf(parser.getConfiguration)
log.info(s"CALLING SCALDING RUN with args: ${args.toList.mkString(" ")}")
val status = scaldingTool.run(args)
log.info("FINISHED RUNNING JOB!")
somePostJobProcessing() // **in which hadoop node would this be run? could it run on any datanode?**
The code you've posted will run on the Hadoop master node.
scaldingTool.run(args)
will trigger your job, which would trigger the jobs that execute on task nodes.