I'm trying to choose the right format for file exchange with my spark application. I use Spark 2.4.7
+ Haddop 2.10
on Kubernetess.
My app downloads CSV file from S3 and process it. The file is provided by a 3rd party company.
I was thinking about asking them to use lz4
, lzo
or other splittable compression. However, what I can see the command line tools file format is not compatible with Hadoop lz4
or lzo
codecs (I tried lzop
and lz4
cli)
Do you know any CLI tools which allow preparing lz4 or lzo compressed files in formats which Hadoop codecs will understand?