Apache spark 3.0 with HDP 2.6 stack

876 views Asked by At

We are planning to setup Apache Spark 3.0 outside of existing HDP 2.6 cluster and to submit the jobs using yarn(v2.7) in that cluster without upgrade or modifying. Currently users are using Spark 2.3 which is included in the HDP stack. Goal is to enable Apache Spark 3.0 outside if HDP cluster without interrupting the current jobs.

What are the best approaches for this? Setup apache 3.0 client nodes outside of HDP cluster and submit it from new client nodes?

Any recommendations on this? Things to avoid conflict with current HDP stack and its components?

1

There are 1 answers

1
mpkd567 On

Built spark 3.0.1 from the spark source code 3.0.1 with specific(HDP 2.6) Hadoop, Hive version. Then deployed it in HDP client nodes only. Spark 3.0.1 pre-built binaries were having compatibility issues with Hive 1.2.1 as it was built with latest hive.

Build options:

./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Phive-1.2 -Phive-thriftserver -DskipTests -Dmaven.test.skip=true clean package