Error while I launch spark-submit because avro

528 views Asked by At

I am creating an application in spark. I use avro files in HDFS with Hadoop2. I use maven and I include avro like this :

<dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-mapred</artifactId>
            <version>1.7.6</version>
            <classifier>hadoop2</classifier>
</dependency>

I did a unit test and while I use mvn test, all work. But While I launch with spark submit no ! and I have this mistake :

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
    at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)

Can you help me ?

Thank you

2

There are 2 answers

1
lea On BEST ANSWER

Ok, I fond the solution :D Thanks to http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Unable-to-Read-Write-Avro-RDD-on-cluster-td10893.html.

The solution is to add jar in your SPARK_CLASSPATH

export SPARK_CLASSPATH=yourpath/avro-mapred-1.7.7-hadoop2.jar:yourpath/avro-1.7.7.jar

You can download the jar here : http://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.7.7/

0
lea On

But it isn't a solution with spark-submit --master yarn-cluster

I have again the same error :

WARN scheduler.TaskSetManager: Lost task 9.1 in stage 0.0 (TID 15, 10.163.34.129): java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)

Someone has another idea ?