Run both batch and real time jobs on Spark with jobserver

150 views Asked by At

I have a spark job that runs every day as part of a pipeline and perform simple batch processing - let's say, adding a column to DF with other column's value squared. (old DF: x, new DF: x,x^2).

I also have a front app that consumes these 2 columns. I want to allow my users to edit x and get the answer from the same code base. Since the batch job is already written in spark, i looked for a way to achieve that against my spark cluster and run into spark jobserver which thought might help here.

My questions:

  1. Can spark jobserver support both batch and single processing?
  2. Can i use the same jobserver-compatible JAR to run a spark job on AWS EMR?
  3. Open to hear about other tools that can help in such use case.

Thanks!

1

There are 1 answers

0
Valentina On
  1. Not sure I understood your scenario fully, but with Spark Jobserver you can configure your batch jobs and pass different parameters to it.
  2. Yes, once you have Jobserver-compatible JAR, you should be able to use it with Jobserver running with Spark in Standalone mode, with YARN or with EMR. But please take into account that you will need to make a setup for Jobserver on EMR. Open source documentation seems to be a bit outdated currently.