What to set Spark Master address to when deploying on Kubernetes Spark Operator?

Question

What to set Spark Master address to when deploying on Kubernetes Spark Operator?

1k views Asked by alex At 16 September 2020 at 19:29

The official spark documentation only has information on the spark-submit method for deploying code to a spark cluster. It mentions we must prefix the address from kubernetes api server with k8s://. What should we do when deploying through Spark Operator?

For instance if I have a basic pyspark application that starts up like this how do I set the master:

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext

sc = SparkContext("local", "Big data App")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('app_name')

Here I have local, where if I was running on a non-k8's cluster I would mention the master address with spark:// prefix or yarn. Must I also use the k8s:// prefix if deploying through the Spark Operator? If not what should be used for master parameter?

Original Q&A

There are 1 answers

**Alex Ott** · Answer 1 · 2020-09-16T20:02:40+00:00

It's better not to use setMaster in the code, but instead specify it when running the code via spark-submit, something like this (see documentation for details):

./bin/spark-submit \
    --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
    --deploy-mode cluster \
    your_script.py

I haven't used Spark operator, but it should set master automatically, as I understand from the documentation.

you also need to get convert this code:

sc = SparkContext("local", "Big data App")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('app_name')

to more modern (see doc):

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()

as SQLContext is deprecated.

P.S. I recommend to get through first chapters of Learning Spark, 2ed that is freely available from the Databricks site.

TechQA.

What to set Spark Master address to when deploying on Kubernetes Spark Operator?

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in KUBERNETES

Related Questions in PYSPARK

Related Questions in SPARK-OPERATOR

Popular Questions

Popular Tags

Trending Questions