we are trying to install kubernetes spark opeartor and write one sample sparkapplication to connect to s3 and write a file. However whatever we do, we aren't able to get rid of the below error:

WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
23/01/15 15:00:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoSuchMethodError: 'char[] org.apache.hadoop.conf.Configuration.getPassword(java.lang.String)'
        at org.apache.spark.SSLOptions$.$anonfun$parse$8(SSLOptions.scala:188)
        at scala.Option.orElse(Option.scala:447)
        at org.apache.spark.SSLOptions$.parse(SSLOptions.scala:188)
        at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:98)
        at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:368)
        at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:368)
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Spark application creation process for Spark operator:

  1. Created the base image for spark

$ cd spark-3.1.1-bin-hadoop3.2

$ ./bin/docker-image-tool.sh -r <registryurl>/nks/sparkoperator/base -t 3.1.1 -u 1000 -b java_image_tag=11-jre-slim build

This created the base image and the same has been pushed to the artifactory <registryurl>/nks/sparkoperator/base/spark:3.1.1

  1. create folders, Dockerfile, build.sbt, app file for the actual application
.
├── Dockerfile
├── build.sbt
├── plugins.sbt
└── src
    └── main
        └── scala
            └── com
                └── company
                    └── xyz
                        └── ParquetAWSExample.scala

Dockerfile

FROM <registryurl>/nks/sparkoperator/base/spark:3.1.1
USER root

RUN apt -y install wget

ARG SBT_VERSION
ENV SBT_VERSION=${SBT_VERSION:-1.5.1}

RUN wget -O - https://github.com/sbt/sbt/releases/download/v${SBT_VERSION}/sbt-${SBT_VERSION}.tgz | gunzip | tar -x -C /usr/local
#WORKDIR /spark

ENV PATH /usr/local/sbt/bin:${PATH}

WORKDIR /app
COPY . /app
ADD plugins.sbt /app/project/
RUN sbt update

RUN sbt clean assembly

This builds the dockerimage - <registryurl>/nks/testsparkoperatorv2/s3conn:1.5

build.sbt

name := "xyz"
version := "0.1"

scalaVersion := "2.12.11"


libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % "3.1.1" ,
      "org.apache.spark" %% "spark-sql" % "3.1.1",
      "org.apache.hadoop" % "hadoop-aws" % "3.1.1",
    )

dependencyOverrides += "org.apache.hadoop" % "hadoop-common" % "3.1.1"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first

spark-application.yaml

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: parquet.test
  namespace: spark-operator
spec:
  type: Scala
  mode: cluster
  image: "<registryurl>/nks/testsparkoperatorv2/s3conn:1.5"
  imagePullPolicy: Always
  imagePullSecrets:
    - myregistrykey
  mainClass: com.company.xyz.ParquetAWSExample
  mainApplicationFile: "local:///app/target/scala-2.12/xyz-assembly-0.1.jar"
  sparkVersion: "3.1.1"
  driver:
    memory: 512m
    labels:
      version: 3.1.1
    serviceAccount: sparkoperator
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: 512m
    labels:
      version: 3.1.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

2

There are 2 answers

1
Abdennacer Lachiheb On

Seem like version mismatch between spark and hadoop, Spark 3.1.1 is compatible with Hadoop 3.2 or higher. It is recommended to use the latest version of Hadoop for optimal performance and security.

try this build.sbt file:

name := "xyz"
version := "0.1"

scalaVersion := "2.12.11"


libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % "3.1.1" ,
      "org.apache.spark" %% "spark-sql" % "3.1.1",
      "org.apache.hadoop" % "hadoop-aws" % "3.2.0", 
    )

dependencyOverrides += "org.apache.hadoop" % "hadoop-common" % "3.2.0"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first

Also maybe try to add this dependency:

"org.apache.hadoop" % "hadoop-hdfs-client" % "3.2.0", 
0
Pro On

Found the solution. Actually even though I was running the below command from within spark-3.1.1-bin-hadoop3.2 folder to build the base image.

$ ./bin/docker-image-tool.sh \
      -r <registryurl>/nks/sparkoperator/base \
      -t 3.1.1 \
      -u 1000 \ 
      -b java_image_tag=11-jre-slim build

It used the default spark installed on my system, which was older, containing kubernetes-client jar of version 5.4.1 which wasn't compatible with our Kubernetes version (1.22)

Therefore, I set the SPARK_HOME to /spark-3.1.1-bin-hadoop3.2 and built the base image, and everything worked afterward.