How to configure Executor in Spark Local Mode

Question

How to configure Executor in Spark Local Mode

4.2k views Asked by Ning Lin At 08 September 2017 at 06:42

In Short

I want to configure my application to use lz4 compression instead of snappy, what I did is:

session = SparkSession.builder()
        .master(SPARK_MASTER) //local[1]
        .appName(SPARK_APP_NAME)
        .config("spark.io.compression.codec", "org.apache.spark.io.LZ4CompressionCodec")
        .getOrCreate();

but looking at the console output, it's still using snappy in the executor

org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY

and

[Executor task launch worker-0] compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.snappy]

According to this post, what I did here only configure the driver, but not the executor. The solution on the post is to change the spark-defaults.conf file, but I'm running spark in local mode, I don't have that file anywhere.

Some more detail:

I need to run the application in local mode (for the purpose of unit test). The tests works fine locally on my machine, but when I submit the test to a build engine(RHEL5_64), I got the error

snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found

I did some research and it seems the simplest fix is to use lz4 instead of snappy for codec, so I try the above solution.

I have been stuck in this issue for several hours, any help is appreciated, thank you.

Original Q&A

There are 2 answers

Alper t. Turker On 08 September 2017 at 08:03

what I did here only configure the driver, but not the executor.

In local mode there is only one JVM which hosts both driver and executor threads.

the spark-defaults.conf file, but I'm running spark in local mode, I don't have that file anywhere.

Mode is not relevant here. Spark in local mode uses the same configuration files. If you go to the directory where you store Spark binaries you should see conf directory:

spark-2.2.0-bin-hadoop2.7 $ ls
bin  conf  data  examples  jars  LICENSE  licenses  NOTICE  python  R  README.md  RELEASE  sbin  yarn

In this directory there is a bunch of template files:

spark-2.2.0-bin-hadoop2.7 $ ls conf 
docker.properties.template  log4j.properties.template    slaves.template               spark-env.sh.template

fairscheduler.xml.template metrics.properties.template spark-defaults.conf.template

If you want to set configuration option copy spark-defaults.conf.template to spark-defaults.conf and edit it according to your requirements.

**Ning Lin** · Accepted Answer · 2017-09-13T01:39:33+00:00

Posting my solution here, @user8371915 does answered the question, but did not solve my problem, because in my case I can't modified the property files.

What I end up doing is adding another configuration

session = SparkSession.builder()
        .master(SPARK_MASTER) //local[1]
        .appName(SPARK_APP_NAME)
        .config("spark.io.compression.codec", "org.apache.spark.io.LZ4CompressionCodec")
        .config("spark.sql.parquet.compression.codec", "uncompressed")
        .getOrCreate();

TechQA.

How to configure Executor in Spark Local Mode

In Short

Some more detail:

There are 2 answers

Related Questions in APACHE-SPARK

Related Questions in SPARK-STREAMING

Related Questions in SNAPPY

Popular Questions

Popular Tags

Trending Questions