Reading a zst archive in Scala & Spark: native zStandard library not available

Question

Reading a zst archive in Scala & Spark: native zStandard library not available

4.1k views Asked by cnstlungu At 14 April 2021 at 21:15

I'm trying to read a zst-compressed file using Spark on Scala.

 import org.apache.spark.sql._
 import org.apache.spark.sql.types._
 val schema = new StructType()
      .add("title", StringType, true)
      .add("selftext", StringType, true)
      .add("score", LongType, true)
      .add("created_utc", LongType, true)
      .add("subreddit", StringType, true)
      .add("author", StringType, true)
 val df_with_schema = spark.read.schema(schema).json("/home/user/repos/concepts/abcde/RS_2019-09.zst")

 df_with_schema.take(1)

Unfortunately this produces the following error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.0.101 executor driver): java.lang.RuntimeException: native zStandard library not available: this version of libhadoop was built without zstd support.

My hadoop checknative looks as follows, but I understand from here that Apache Spark has its own ZStandardCodec.

Native library checking:

hadoop: true /opt/hadoop/lib/native/libhadoop.so.1.0.0

zlib: true /lib/x86_64-linux-gnu/libz.so.1

zstd : true /lib/x86_64-linux-gnu/libzstd.so.1

snappy: true /lib/x86_64-linux-gnu/libsnappy.so.1

lz4: true revision:10301

bzip2: true /lib/x86_64-linux-gnu/libbz2.so.1

openssl: false EVP_CIPHER_CTX_cleanup

ISA-L: false libhadoop was built without ISA-L support

PMDK: false The native code was built without PMDK support.

Any ideas are appreciated, thank you!

UPDATE 1: As per this post, I've understood better what the message meant, namely that zstd is not enabled when compiling Hadoop by default, so one of possible solutions would be obviously building it with that flag enabled.

Original Q&A

There are 1 answers

**cnstlungu** · Accepted Answer · 2021-04-18T21:25:15+00:00

Since I didn't want to build Hadoop by myself, inspired by the workaround used here, I've configured Spark to use Hadoop native libraries:

spark.driver.extraLibraryPath=/opt/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/hadoop/lib/native

I can now read the zst archive into a DataFrame with no issues.

TechQA.

Reading a zst archive in Scala & Spark: native zStandard library not available

There are 1 answers

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in HADOOP

Related Questions in APACHE-SPARK-SQL

Related Questions in ZSTD

Popular Questions

Popular Tags

Trending Questions