I have to run some Spark python scripts as Oozie workflows, I've tested the scripts locally with Spark but when I submit them to Oozie I can't figure out why is not working. I'm using the Cloudera VM, and I'm managing Oozie with the Hue dashboard. Here is the workflow configuration for the spark action:
Spark Master: local[*]
Mode: client
App name: myApp
Jars/py files: hdfs://localhost:8120/user/cloudera/example.py
Main class: org.apache.spark
I tried also to run a simple example that just prints something, but every script I submit Oozie gives me this output:
>>> Invoking Spark class now >>>
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000005-161228161942928-oozie-oozi-W/spark-cc87--spark/action-data.seq
Oozie Launcher ends
[EDIT]
I found out that the workflow starts only if I set spark master: yarn-cluster, but even in this mode it is launched the yarn container that remains stuck at 95% completed map while spark app remains in status ACCEPTED. I'm trying to change Yarn memory parameters for allowing the Spark action to start. The stout just print Heartbeat
[SOLVED]
The oozie workflow starts only if the py file is local, and manually inserted into the lib folder after hue has created the workflow folder. I think that the best solution is still to write a shell script with a spark-submit
The error which you are showing is from stdout file of your oozie job. Can you check stderr file once and post your output here. That might have some more clues related to your issue.
You can use oozie web console to trace the oozie job logs.