Spark Unit Testing

784 views Asked by At

My entire build.sbt is:

name := """sparktest"""

version := "1.0.0-SNAPSHOT"

scalaVersion := "2.11.8"

scalacOptions := Seq("-unchecked", "-deprecation", "-encoding", "utf8", "-Xexperimental")

parallelExecution in Test := false

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.0.2",
  "org.apache.spark" %% "spark-sql" % "2.0.2",
  "org.apache.avro" % "avro" % "1.8.1",

  "org.scalatest" %% "scalatest" % "3.0.1" % "test",
  "com.holdenkarau" %% "spark-testing-base" % "2.0.2_0.4.7" % "test"
)

I have a simple test. Obviously, this is just a starting point, I'd like to test more:

package sparktest

import com.holdenkarau.spark.testing.DataFrameSuiteBase

import org.scalatest.FunSuite

class SampleSuite extends FunSuite with DataFrameSuiteBase {
  test("simple test") {
    assert(1 + 1 === 2)
  }
}

I run sbt clean test and get a failure with:

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf$ConfVars

For my dev environment, I'm using the spark-2.0.2-bin-hadoop2.7.tar.gz

Do I have to configure this environment in any way? Obviously HiveConf is a transitive Spark dependency

1

There are 1 answers

1
Holden On BEST ANSWER

As @daniel-de-paula mentions in the comments you will need to add spark-hive as an explicit dependency (you can restrict this to the test scope though if you aren't using hive in your application its self). spark-hive is not a transitive dependency of spark-core which is why this error happened. spark-hive is excluded from spark-testing-base as a dependency so that people who are doing RDD only tests don't need to add it as a dependency.