I have a spark job that runs every day as part of a pipeline and perform simple batch processing - let's say, adding a column to DF with other column's value squared. (old DF: x, new DF: x,x^2).
I also have a front app that consumes these 2 columns. I want to allow my users to edit x and get the answer from the same code base. Since the batch job is already written in spark, i looked for a way to achieve that against my spark cluster and run into spark jobserver which thought might help here.
My questions:
- Can spark jobserver support both batch and single processing?
- Can i use the same jobserver-compatible JAR to run a spark job on AWS EMR?
- Open to hear about other tools that can help in such use case.
Thanks!