Why does sbt assembly in Spark project fail with "Please add any Spark dependencies by supplying the sparkVersion and sparkComponents"?

910 views Asked by At

I work on a sbt-managed Spark project with spark-cloudant dependency. The code is available on GitHub (on spark-cloudant-compile-issue branch).

I've added the following line to build.sbt:

"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"

And so build.sbt looks as follows:

name := "Movie Rating"

version := "1.0"

scalaVersion := "2.10.5"

libraryDependencies ++= {
  val sparkVersion =  "1.6.0"
  Seq(
     "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion % "provided",
     "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
     "org.apache.kafka" % "kafka-log4j-appender" % "0.9.0.0",
     "org.apache.kafka" % "kafka-clients" % "0.9.0.0",
     "org.apache.kafka" %% "kafka" % "0.9.0.0",
     "cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
    )
}

assemblyMergeStrategy in assembly := {
  case PathList("org", "apache", "spark", xs @ _*) => MergeStrategy.first
  case PathList("scala", xs @ _*) => MergeStrategy.discard
  case PathList("META-INF", "maven", "org.slf4j", xs @ _* ) => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

unmanagedBase <<= baseDirectory { base => base / "lib" }

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

When I execute sbt assembly I get the following error:

java.lang.RuntimeException: Please add any Spark dependencies by 
   supplying the sparkVersion and sparkComponents. Please remove: 
   org.apache.spark:spark-core:1.6.0:provided
2

There are 2 answers

0
Jacek Laskowski On BEST ANSWER

NOTE I still can't reproduce the issue, but think it does not really matter.

java.lang.RuntimeException: Please add any Spark dependencies by supplying the sparkVersion and sparkComponents.

In your case, your build.sbt misses a sbt resolver to find spark-cloudant dependency. You should add the following line to build.sbt:

resolvers += "spark-packages" at "https://dl.bintray.com/spark-packages/maven/"

PROTIP I strongly recommend using spark-shell first and only when you're comfortable with the package switch to sbt (esp. if you're new to sbt and perhaps other libraries/dependencies too). It's too much to digest in one bite. Follow https://spark-packages.org/package/cloudant-labs/spark-cloudant.

1
The_Tourist On

Probably related: https://github.com/databricks/spark-csv/issues/150

Can you try adding spIgnoreProvided := true to your build.sbt?

(This might not be the answer and I could have just posted a comment but I don't have enough reputation)