Connection to remote Hadoop Cluster (CDP) through Linux server

175 views Asked by At

I'm new to PySpark and I want to connect remote Hadoop Cluster (CDP) through Linux server by using spark-submit command.

Any help would be appreciated.

I need spark-submit command to connect remote CDP.

1

There are 1 answers

0
ozlemg On

You can use Apache Livy to submit remote jobs to a CDP cluster. Here is detailed info on how to install and use Livy to submit jobs : After downloading and unzipping Livy you should add following lines in livy.conf file. Then start livy service.

livy.spark.master = yarn
livy.spark.deploy-mode = cluster

You can find examples of how to create a spark submit script on following links:

  1. https://community.cloudera.com/t5/Community-Articles/Submit-a-Spark-Job-to-CDP-Data-Hub-using-the-Livy-REST-API/ta-p/322481
  2. https://livy.apache.org/examples/