My entire build.sbt is:
name := """sparktest"""
version := "1.0.0-SNAPSHOT"
scalaVersion := "2.11.8"
scalacOptions := Seq("-unchecked", "-deprecation", "-encoding", "utf8", "-Xexperimental")
parallelExecution in Test := false
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.2",
"org.apache.spark" %% "spark-sql" % "2.0.2",
"org.apache.avro" % "avro" % "1.8.1",
"org.scalatest" %% "scalatest" % "3.0.1" % "test",
"com.holdenkarau" %% "spark-testing-base" % "2.0.2_0.4.7" % "test"
)
I have a simple test. Obviously, this is just a starting point, I'd like to test more:
package sparktest
import com.holdenkarau.spark.testing.DataFrameSuiteBase
import org.scalatest.FunSuite
class SampleSuite extends FunSuite with DataFrameSuiteBase {
test("simple test") {
assert(1 + 1 === 2)
}
}
I run sbt clean test
and get a failure with:
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf$ConfVars
For my dev environment, I'm using the spark-2.0.2-bin-hadoop2.7.tar.gz
Do I have to configure this environment in any way? Obviously HiveConf is a transitive Spark dependency
As @daniel-de-paula mentions in the comments you will need to add spark-hive as an explicit dependency (you can restrict this to the test scope though if you aren't using hive in your application its self). spark-hive is not a transitive dependency of spark-core which is why this error happened. spark-hive is excluded from spark-testing-base as a dependency so that people who are doing RDD only tests don't need to add it as a dependency.