Prepraing lzo or lz4 files for Spark

207 views Asked by At

I'm trying to choose the right format for file exchange with my spark application. I use Spark 2.4.7 + Haddop 2.10 on Kubernetess. My app downloads CSV file from S3 and process it. The file is provided by a 3rd party company.

I was thinking about asking them to use lz4, lzo or other splittable compression. However, what I can see the command line tools file format is not compatible with Hadoop lz4 or lzo codecs (I tried lzop and lz4 cli)

Do you know any CLI tools which allow preparing lz4 or lzo compressed files in formats which Hadoop codecs will understand?

0

There are 0 answers