SparkAction for yarn-cluster

4.1k views Asked by At

Using the Hortonworks HDP 2.3 preview sandbox (oozie:4.2.0.2.3.0.0-2130, spark:1.3 and Hadoop:2.7.1.2.3.0.0-2130), I am trying to invoke the oozie spark action using "yarn-cluster" as the master. The example provided in Oozie Spark Action is for running the spark action on "local" master.

The same page also suggests to be able to run on Yarn, the spark assembly jar should be available to the spark action.

I have two questions

  • How do we make the spark assembly jar available to Spark Action? Should I use the jar element in the oozie spark action?
  • I get the following error when I submit the job without adding the assembly jar explicitly

    Using properties file: null
    Using properties file: null
    Parsed arguments:
       master                  yarn-master
       deployMode              cluster
       executorMemory          512m
       executorCores           null
       totalExecutorCores      null
       propertiesFile          null
       extraSparkProperties    Map()
       driverMemory            null
       driverCores             null
       driverExtraClassPath    null
       driverExtraLibraryPath  null
       driverExtraJavaOptions  null
       supervise               false
       queue                   null
       numExecutors            3
       files                   null
       pyFiles                 null
       archives                null
       mainClass               com.foo.bar.spark.examples.WordCountSparkJob
       primaryResource         hdfs://sandbox.hortonworks.com:8020/apps/foo/sandbox.hortonworks.com/1.201-SNAPSHOT/oozieapp/lib/abc-1.201-SNAPSHOT.jar
       name                    Spark Example
       childArgs               [inputpath=hdfs://sandbox.hortonworks.com:8020/tmp/bcp_examples/input/]
       jars                    null
       verbose                 true
    
    Default properties from null:
    Error: Could not load YARN classes. This copy of Spark may not have been compiled with YARN support.
    Run with --help for usage help or --verbose for debug output
    Intercepting System.exit(-1)
    Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [-1]
    

Appreciate any pointers on how to solve the problem.

2

There are 2 answers

0
Simplefish On

The default sharelib distributed with Oozie in HDP2.3 for the spark action is not assembled for YARN.

If you've installed the spark via the hortonworks distro, then you can just replace the contents of the sharelib for the spark action with the installed version.

E.g. (as the oozie user)

hadoop fs -mv /user/oozie/share/lib/spark /user/oozie/share/lib/spark-bak
hadoop fs -mkdir /user/oozie/share/lib/spark
hadoop fs -put /usr/hdp/current/spark-client/lib/* /user/oozie/share/lib/spark
hadoop fs -cp /user/oozie/share/lib/spark-bak/oozie* /user/oozie/share/lib/spark
0
郭天佑 On

This error is caused by class org.apache.spark.deploy.yarn.Client can't be loaded. And it contains in spark-assembly jar, which can be find in the /usr/hdp/current/spark-client/lib/. After you add this file into hdfs://hd-host:port/user/oozie/share/lib/spark, you have to restart oozie to make it valid immediately.