The official spark documentation only has information on the spark-submit
method for deploying code to a spark cluster. It mentions we must prefix the address from kubernetes api server with k8s://
. What should we do when deploying through Spark Operator?
For instance if I have a basic pyspark application that starts up like this how do I set the master:
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext
sc = SparkContext("local", "Big data App")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('app_name')
Here I have local
, where if I was running on a non-k8's cluster I would mention the master address with spark://
prefix or yarn
. Must I also use the k8s://
prefix if deploying through the Spark Operator?
If not what should be used for master parameter?
It's better not to use
setMaster
in the code, but instead specify it when running the code via spark-submit, something like this (see documentation for details):I haven't used Spark operator, but it should set master automatically, as I understand from the documentation.
you also need to get convert this code:
to more modern (see doc):
as
SQLContext
is deprecated.P.S. I recommend to get through first chapters of Learning Spark, 2ed that is freely available from the Databricks site.