spark: exec: "executor": executable file not found in $PATH: unknown

529 views Asked by At

I am trying to do some calculus by using petastorm v0.11.4 in a docker container and minikube v1.25.2

As long as I run the process locally, everything works as expected. As soon as I try to spread the work in the minikube cluster, I receive the following error message from kubelet:

Error: failed to start container "spark-kubernetes-executor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "executor": executable file not found in $PATH: unknown

And the executor pods terminate and new once are created.

The code looks as follows:

spark_conf = SparkConf()
spark_conf.setMaster("k8s://https://kubernetes.default:443")
spark_conf.setAppName("PetastormDsCreator")
spark_conf.set(
    "spark.driver.memory",
    "2g"
)
#k8s conf can be red here https://spark.apache.org/docs/latest/running-on-kubernetes.html
spark_conf.set(
    "spark.kubernetes.namespace",
    "spark"
)
spark_conf.set(
    "spark.kubernetes.authenticate.driver.serviceAccountName",
    "spark-driver"
)
spark_conf.set(
    "spark.kubernetes.authenticate.caCertFile",
    "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
)
spark_conf.set(
    "spark.kubernetes.authenticate.oauthTokenFile",
    "/var/run/secrets/kubernetes.io/serviceaccount/token"
)
spark_conf.set(
    "spark.executor.instances",
    "2"
)
spark_conf.set(
    "spark.driver.host",
    "petastorm-ds-creator" #must match the pods name =)
)
spark_conf.set(
    "spark.driver.port",
    "20022"
)
spark_conf.set(
    "spark.kubernetes.container.image",
    "localhost:5000/petastorm:v0.0.1"
)
spark_conf.set(
    "spark.kubernetes.driver.volumes.hostPath.data.mount.path", #spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path
    "/data"
)
spark_conf.set(
    "spark.kubernetes.executor.volumes.hostPath.data.mount.path", #spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path
    "/data"
)
spark_conf.set(
    "spark.kubernetes.driver.volumes.hostPath.data.options.path", #spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path
    "/data"
)
spark_conf.set(
    "spark.kubernetes.executor.volumes.hostPath.data.options.path", #spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path
    "/data"
)
spark = SparkSession.builder.config(conf=spark_conf).getOrCreate()

sc = spark.sparkContext

t = sc.parallelize(range(10))
r = t.sumApprox(3)
print('Approximate sum: %s' % r)

Did anyone face a similar issue? Unfortunately, I did not find many tutorials explaining how to configure or using petastorm in kubernetes.

0

There are 0 answers