I am trying to deploy the spark job into spark cluster and facing an issue with classloading I believe.
Error details as below:
java.lang.ClassNotFoundException: org.examples.datasets.FlightDataProcessor
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:810)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/06/02 06:26:11 INFO ShutdownHookManager: Shutdown hook called
Currently, I have to copy the application jars to spark jars folder as part of the deployment script to solve this issue.
But I believe that spark-submit driver should have intelligently done this thing as everytime I want deploy some spark job I need to copy all client programs to jars its not convenient at all
Environment Info:
Spark version: 2.11-2.4.3
Scala version: 2.11.8
Here is a link to my source code:
Deployment Script
https://github.com/anhtv08/spark-cassandra-example/blob/master/scripts/submit_spark_flight_job.sh
Spark job code
Appreciate for any help.
Since the code is available is part of a jar, we can upload the jar to hdfs or a maven repo and use either of following options.
* --jars/spark.jars - we can specify the path to the jars uploaded to HDFS.
* --packages/spark.jars.packages - we can specify the maven co-ordinates if we are able to upload the jar to a maven repo(Also we need to specify the credentials to the Spark about the maven repo)