I have a spark operator with sparkVersion: "3.1.1" and would like to use it for structured streaming to/from minIO. However, I have not been able to find a compatible combination of libraries for anything newer than hadoop 2.7.0. (which does not support the new s3a:// paths)
Is there a compatible set of spark/hadoop/aws libraries for the 3.1.1 version of spark?
my current dependencies in sbt should work based on https://mvnrepository.com/ dependencies, but they dont (NoSuchMethodError):
scalaVersion := "2.12.0"
lazy val Versions = new {
val spark = "3.1.1"
val hadoop = "3.2.0"
val scalatest = "3.0.4"
}
"org.apache.spark" %% "spark-core" % Versions.spark % Provided
, "org.apache.spark" %% "spark-sql" % Versions.spark % Provided
, "org.apache.spark" %% "spark-hive" % Versions.spark % Provided
, "org.scalatest" %% "scalatest" % Versions.scalatest % Test
, "org.apache.hadoop" % "hadoop-aws" % Versions.hadoop
, "org.apache.hadoop" % "hadoop-common" % Versions.hadoop
, "org.apache.hadoop" % "hadoop-mapreduce-client-core" % Versions.hadoop
, "org.apache.hadoop" % "hadoop-client" % Versions.hadoop
, "com.typesafe" % "config" % "1.3.1"
, "com.github.scopt" %% "scopt" % "3.7.0"
, "com.github.melrief" %% "purecsv" % "0.1.1"
, "joda-time" % "joda-time" % "2.9.9"
thanks a lot for any help
This combo of libraries works:
The trick is to use this image for spark
gcr.io/spark-operator/spark:v3.1.1-hadoop3
, as the default one still has Hadoop 2.7 even for Spark 3.1.1