I am creating an application in spark. I use avro files in HDFS with Hadoop2. I use maven and I include avro like this :
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>1.7.6</version>
<classifier>hadoop2</classifier>
</dependency>
I did a unit test and while I use mvn test, all work. But While I launch with spark submit no ! and I have this mistake :
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
Can you help me ?
Thank you
Ok, I fond the solution :D Thanks to http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Unable-to-Read-Write-Avro-RDD-on-cluster-td10893.html.
The solution is to add jar in your SPARK_CLASSPATH
You can download the jar here : http://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.7.7/