I am trying to run spark/java application on kubernetese (via minikube) using spark-operator. I am getting a bit confused on what should I place in the Dockerfile so that it could be built in the image format and execute via spark-operator ?
Sample spark-operator.yaml :
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: my-spark-app
namespace: default
spec:
type: Java
mode: cluster
image: docker/repo/my-spark-app-image
mainApplicationFile: local:///opt/app/my-spark-app.jar
As mentioned above, the spark operator yaml only requires the jar and the image location. So, do I need to mention just below in my Dockerfile ? Is there any sample Dockerfile available which I can refer ?
Dockerfile:
FROM openjdk11:alpine-jre
COPY target/*.jar /opt/app/csp_auxdb_refresh.jar
COPY src/main/resources/* opt/app
In the
Dockerfile
you have provided, nor Spark, nor other dependencies are installed. To quickly get started, usegcr.io/spark-operator/spark:v3.1.1
as the base for your image, i.e. change theFROM
statement toFROM gcr.io/spark-operator/spark:v3.1.1
and build again.There is a great guide on how to get started with the
spark-operator
in their Github repo (here).