Spark saveAsNewAPIHadoopFile works on local mode but not on Cluster mode

464 views Asked by At

After upgrading to CDH5.4 and Spark streaming 1.3, I'm encountering a strange issue where saveAsNewAPIHadoopFile is no longer saving files to HDFS as it's suppose to. I can see that the _temp directory being generated, but when the Save is complete, the _temp is removed and leaving the directory empty with just a SUCCESS file. I have a feeling that the files are generated but afterward, they were unable to be moved out of the _temp directory before _temp is deleted.

This issue only happen when running on the Spark Cluster (standalone mode). If I run the job with local spark, files are saved as expected.

Some help would be appreciated.

1

There are 1 answers

0
David Adams On

Are you running this on your laptop/desktop?

One way this can happen is if the path you use for your output is a relative path on NFS. In that case, Spark assumes relative paths are hdfs:// not file:// and can't write out to disk.