I am trying to setup a yarn managed spark cluster. I intend to use s3 for reading and writing. My setup consisted of:
- hadoop with yarn (3.3.4)
- spark without hadoop (3.5.1)
- which was built with hadoop [3.3.4]
- hadoop-aws-3.3.4.jar, aws-java-sdk-bundle-1.11.656.jar
Some relevant files (for context):
$HADOOP_CONF/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>s3a://my_storage_bucket</value>
</property>
</configuration>
$HADOOP_CONF/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/nm-log-dir</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>local://home/admin/hadoop-aws-3.3.4.jar,local://aws-java-sdk-bundle-1.11.656.jar</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ec2_instance_dns_name</value>
</property>
</configuration>
When I run a simple spark application
$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --conf "spark.driver.extraClassPath=local://home/admin/aws-java-sdk-bundle-1.11.656.jar" --conf "spark.driver.extraClassPath=local://home/admin/hadoop-aws-3.3.4.jar" --jars local://home/admin/aws-java-sdk-bundle-1.11.656.jar,local://home/admin/hadoop-aws-3.3.4.jar $SPARK_HOME/examples/jars/spark-examples*.jar 10
It fails with
24/03/19 23:34:59 INFO Client:
client token: N/A
diagnostics: [Tue Mar 19 23:34:58 +0000 2024] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:8192, vCores:8> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; Queue's capacity (absolute resource) = <memory:8192, vCores:8> ; Queue's used capacity (absolute resource) = <memory:0, vCores:0> ; Queue's max capacity (absolute resource) = <memory:8192, vCores:8> ;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1710891298284
final status: UNDEFINED
tracking URL: http://dns_name.us-west-2.compute.amazonaws.com:8088/proxy/application_1711110592810_0006/
user: admin
24/03/19 23:35:00 INFO Client: Application report for application_1710890592810_0006 (state: FAILED)
24/03/19 23:35:00 INFO Client:
client token: N/A
diagnostics: Application application_17111110592810_0006 failed 2 times due to AM Container for appattempt_17111110592810_0006_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2024-03-19 23:35:00.012]Failed to download resource { { s3a://sb-test00/user/admin/.sparkStaging/application_1711110592810_0006/JLargeArrays-1.5.jar, 1711111298000, FILE, null },pending,[(container_1710890592810_0006_02_000001)],14575954948080,DOWNLOADING} java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found```