How to test Scala Spark UnitTest/DQ check using ZIO library?

90 views Asked by At

I am new to Scala. I am trying to unit test ASSERTIONS for UT/DQ check for Scala Spark Dataframe using ZIO library. Can anyone help me out here if they have already worked on ZIO library before.

1

There are 1 answers

1
hughw On

I would recommend spark-fast-tests for making assertions about Spark DataFrames in scala. ZIO-test isn't one of the frameworks that spark-fast-tests has documented support for, but you should still be able to utilise it.

Example

If you have some transformation on a DataFrame that you need to test:

import org.apache.spark.sql.functions.lit
import org.apache.spark.sql.DataFrame


object Transformations {
  def appendLiteral(incomingData: DataFrame): DataFrame =
    incomingData.withColumn("foo", lit("bar"))
}

A naive test, which doesn't leverage the wider ZIO effect ecosystem, might look like this:

import com.github.mrpowers.spark.fast.tests.DataFrameComparer
import org.apache.spark.sql.SparkSession
import zio.test._
import zio.test.Assertion._

object TransformationsSpec extends ZIOSpecDefault with DataFrameComparer {
  val spark: SparkSession = SparkSession.builder().config("spark.master", "local").getOrCreate()
  import spark.implicits._

  def spec = suite("TransformationSpec")(
    test("appendLiteral adds a column named 'foo' with value 'bar'") {
      val testInput: DataFrame = Seq("Hello", "hi", "howdy").toDF("greeting")
      val expected: DataFrame =  Seq(("Hello", "bar"), ("hi", "bar"), ("howdy", "bar")).toDF("greeting", "foo")

      val result = testInput.transform(Transformations.appendLiteral)

      assert(assertSmallDataFrameEquality(expected, result, ignoreNullable = true))(isUnit)
    }
  )
}