Spark with Yarn failing with S3 ClassNotFound on non-S3 tasks

32 views Asked by At

I am trying to setup a yarn managed spark cluster. I intend to use s3 for reading and writing. My setup consisted of:

  • hadoop with yarn (3.3.4)
  • spark without hadoop (3.5.1)
    • which was built with hadoop [3.3.4]
  • hadoop-aws-3.3.4.jar, aws-java-sdk-bundle-1.11.656.jar

Some relevant files (for context):

  • $HADOOP_CONF/core-site.xml
<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>s3a://my_storage_bucket</value>
</property>
</configuration>
  • $HADOOP_CONF/yarn-site.xml
<configuration>
<property>
  <name>yarn.nodemanager.local-dirs</name>
  <value>/tmp/nm-local-dir</value>
</property>
<property>
  <name>yarn.nodemanager.log-dirs</name>
  <value>/tmp/nm-log-dir</value>
</property>
<property>
  <name>yarn.application.classpath</name>
<value>local://home/admin/hadoop-aws-3.3.4.jar,local://aws-java-sdk-bundle-1.11.656.jar</value>
</property>
 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
 </property>
 <property>
   <name>yarn.resourcemanager.hostname</name>
   <value>ec2_instance_dns_name</value>
 </property>
</configuration>

When I run a simple spark application

$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --conf "spark.driver.extraClassPath=local://home/admin/aws-java-sdk-bundle-1.11.656.jar" --conf "spark.driver.extraClassPath=local://home/admin/hadoop-aws-3.3.4.jar" --jars local://home/admin/aws-java-sdk-bundle-1.11.656.jar,local://home/admin/hadoop-aws-3.3.4.jar $SPARK_HOME/examples/jars/spark-examples*.jar 10

It fails with

24/03/19 23:34:59 INFO Client:
     client token: N/A
     diagnostics: [Tue Mar 19 23:34:58 +0000 2024] Application is Activated, waiting for resources to be assigned for AM.  Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:8192, vCores:8> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; Queue's capacity (absolute resource) = <memory:8192, vCores:8> ; Queue's used capacity (absolute resource) = <memory:0, vCores:0> ; Queue's max capacity (absolute resource) = <memory:8192, vCores:8> ;
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1710891298284
     final status: UNDEFINED
     tracking URL: http://dns_name.us-west-2.compute.amazonaws.com:8088/proxy/application_1711110592810_0006/
     user: admin
24/03/19 23:35:00 INFO Client: Application report for application_1710890592810_0006 (state: FAILED)
24/03/19 23:35:00 INFO Client:
     client token: N/A
     diagnostics: Application application_17111110592810_0006 failed 2 times due to AM Container for appattempt_17111110592810_0006_000002 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2024-03-19 23:35:00.012]Failed to download resource { { s3a://sb-test00/user/admin/.sparkStaging/application_1711110592810_0006/JLargeArrays-1.5.jar, 1711111298000, FILE, null },pending,[(container_1710890592810_0006_02_000001)],14575954948080,DOWNLOADING} java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found```

0

There are 0 answers