we are trying to install kubernetes spark opeartor and write one sample sparkapplication to connect to s3 and write a file. However whatever we do, we aren't able to get rid of the below error:
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
23/01/15 15:00:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoSuchMethodError: 'char[] org.apache.hadoop.conf.Configuration.getPassword(java.lang.String)'
at org.apache.spark.SSLOptions$.$anonfun$parse$8(SSLOptions.scala:188)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.SSLOptions$.parse(SSLOptions.scala:188)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:98)
at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
at scala.Option.map(Option.scala:230)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Spark application creation process for Spark operator:
- Created the base image for spark
$ cd spark-3.1.1-bin-hadoop3.2
$ ./bin/docker-image-tool.sh -r <registryurl>/nks/sparkoperator/base -t 3.1.1 -u 1000 -b java_image_tag=11-jre-slim build
This created the base image and the same has been pushed to the artifactory
<registryurl>/nks/sparkoperator/base/spark:3.1.1
- create folders, Dockerfile, build.sbt, app file for the actual application
.
├── Dockerfile
├── build.sbt
├── plugins.sbt
└── src
└── main
└── scala
└── com
└── company
└── xyz
└── ParquetAWSExample.scala
Dockerfile
FROM <registryurl>/nks/sparkoperator/base/spark:3.1.1
USER root
RUN apt -y install wget
ARG SBT_VERSION
ENV SBT_VERSION=${SBT_VERSION:-1.5.1}
RUN wget -O - https://github.com/sbt/sbt/releases/download/v${SBT_VERSION}/sbt-${SBT_VERSION}.tgz | gunzip | tar -x -C /usr/local
#WORKDIR /spark
ENV PATH /usr/local/sbt/bin:${PATH}
WORKDIR /app
COPY . /app
ADD plugins.sbt /app/project/
RUN sbt update
RUN sbt clean assembly
This builds the dockerimage - <registryurl>/nks/testsparkoperatorv2/s3conn:1.5
build.sbt
name := "xyz"
version := "0.1"
scalaVersion := "2.12.11"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.1.1" ,
"org.apache.spark" %% "spark-sql" % "3.1.1",
"org.apache.hadoop" % "hadoop-aws" % "3.1.1",
)
dependencyOverrides += "org.apache.hadoop" % "hadoop-common" % "3.1.1"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
spark-application.yaml
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: parquet.test
namespace: spark-operator
spec:
type: Scala
mode: cluster
image: "<registryurl>/nks/testsparkoperatorv2/s3conn:1.5"
imagePullPolicy: Always
imagePullSecrets:
- myregistrykey
mainClass: com.company.xyz.ParquetAWSExample
mainApplicationFile: "local:///app/target/scala-2.12/xyz-assembly-0.1.jar"
sparkVersion: "3.1.1"
driver:
memory: 512m
labels:
version: 3.1.1
serviceAccount: sparkoperator
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: 512m
labels:
version: 3.1.1
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
Seem like version mismatch between spark and hadoop, Spark 3.1.1 is compatible with Hadoop 3.2 or higher. It is recommended to use the latest version of Hadoop for optimal performance and security.
try this build.sbt file:
Also maybe try to add this dependency: