Hell everyone, I have a problem with Apache Spark (version 3.3.1) on k8s.
In short: When I run the statement
print(sc.uiWebUrl)
within a pod, I would get a URL that is accessible from outside the k8s cluster.
Something like:
http://{{my-ingress-host}}
Long story:
I want to create a workspace for Apache Spark on k8s, where the driver's pod, is the workspace that I work on. I want to let the client run Apache Spark either with pyspark-shell or with the pyspark python library.
In either way, I want that the UI's web url would be a one that is accessible from the outside world (outside the k8s cluster).
Why? Because of UX, I want to make my client's life easier.
Because I run on k8s, part of the configuration of my Apache Spark program is:
spark.driver.host={{driver-service}}.{{drivers-namespace}}.svc.cluster.local
spark.driver.bindAddress=0.0.0.0
Because of that, the output of this code:
print(sc.webUiUrl)
Would be:
http://{{driver-service}}.{{drivers-namespace}}.svc.cluster.local:4040
Also in the pyspark-shell, the same address would be displayed.
So my question is, is there a way to change the ui web url's host to a host that I have defined in my ingress to make my client's life easier?
So the new output would be:
http://{{my-defined-host}}
Other points I want to make sure to adjust the solution as much as possible:
- I don't have a
nginxingress in my k8s cluster. Maybe I have aHAPROXYingress. But I would want to be coupled to my ingress implementation as least as possiable. - I would prefer that the client would need to configure Apache Spark as least as possible.
- I would prefer that the ui web url of the Spark's context would be set when creating the context, meaning before the
pyspark-shelldisplays the welcome screen. - I have tried messing with the
ui.proxyconfigurations, and it haven't helped. And sometimes made things worst.
Thanks ahead for everyone, any help would be appreciated.
You can change your web UI's host to a host that you want by setting the
SPARK_PUBLIC_DNSenvironment variable. This needs to be done on the driver, since the web UI runs on the driver.To set the port for the web UI, you can do that using the
spark.ui.portconfig parameter.So putting both together using
spark-submitfor example, makes something like the following: