I get the following exception when reading files from a mapr directory:
java.lang.NullPointerException
at com.mapr.fs.MapRFsInStream.read(MapRFsInStream.java:150)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:203)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:167)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:37)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:90)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:90)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:37)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:240)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:149)
at org.apache.spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
When I run it on a local spark interrupter, I do not get an exception. My guess is the file type is causing the excepting. Any idea what is causing this NP?
Can you give some more context about what you are attempting to run here ? The versions of the components involved etc.
The NPE above normally happens if the fileSystem object is closed before the actual input data for a Map/Reduce job is finished being read and output written. Spark might be attempting something similar here.