FileNotFoundException for temporary file when runs Spark on Dataproc/Yarn

34 views Asked by Allan Silva At 06 February 2024 at 06:12

I need download a binary file with proprietary format, make a conversion, then move converted back to storage.

I create directory/file on /tmp, using java Files.createTempDirectory and make the conversion. I had try run code on the driver and the worker.

When I run spark locally, It works. But in a managed cluster on Dataproc, I got FileNotFoundException.

There are a recomended way to process a binary file on Spark? Or there are other temporary location I should save the temporary files?

Ps: Using a binary datasource, or a stream does not work in my case, once I rely on an external lib that accepts paths only.

TechQA.