How to use external Spark with the Cloudera cluster?

48 views Asked by At

I need to use Spark on a host that is not part of the Cloudera cluster to run Spark jobs on the Cloudera cluster.

Is it possible to use it this way? If yes, how to configure?

what I've already tried:

1. Download "https://www.apache.org/dyn/closer.lua/spark/spark-3.3.4/spark-3.3.4-bin-hadoop3.tgz"

2. Copy the "conf" files from the Cloudera cluster and send them to the new Spark directory

3. exported the variables "HADOOP_CONF_DIR" and "SPARK_CONF_DIR" and "SPARK_HOME" using the new spark directory "spark-3.3.4-bin-hadoop3" with the files

4. When trying to run spark-shell as an example, nothing happens:

it hangs as shown below:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.4
      /_/

Using Scala version 2.13.8 (Java HotSpot(TM) 64-Bit Server VM, Java 11.0.16.1)
Type in expressions to have them evaluated.
Type :help for more information.

note: the cluster has kerberos, so before running spark-shell, kinit was run

0

There are 0 answers