In Short
I want to configure my application to use lz4 compression instead of snappy, what I did is:
session = SparkSession.builder()
.master(SPARK_MASTER) //local[1]
.appName(SPARK_APP_NAME)
.config("spark.io.compression.codec", "org.apache.spark.io.LZ4CompressionCodec")
.getOrCreate();
but looking at the console output, it's still using snappy in the executor
org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY
and
[Executor task launch worker-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.snappy]
According to this post, what I did here only configure the driver, but not the executor. The solution on the post is to change the spark-defaults.conf file, but I'm running spark in local mode, I don't have that file anywhere.
Some more detail:
I need to run the application in local mode (for the purpose of unit test). The tests works fine locally on my machine, but when I submit the test to a build engine(RHEL5_64), I got the error
snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found
I did some research and it seems the simplest fix is to use lz4 instead of snappy for codec, so I try the above solution.
I have been stuck in this issue for several hours, any help is appreciated, thank you.
Posting my solution here, @user8371915 does answered the question, but did not solve my problem, because in my case I can't modified the property files.
What I end up doing is adding another configuration