I am trying to implement Deequ Check: date_start distinct values should match number of days between 2018-01-01 and $runDate
Here is what I do: Calculate date diff
val min_dt = LocalDate.of(2018, 1, 1)
// Adjusting max_dt to account for the Airflow Daily DAG run_date hourly run_date
// Also, accounting for Days.between is exclusive of the end date
val max_dt: LocalDate = if(check_sched == "daily") runDate.plusDays(2) else runDate.plusDays(1)
val expected_count:Long = min_dt.toEpochDay() - max_dt.toEpochDay()
- Adding Check for hasSize
val assert_func = (size:Long) => size == expected_count
val basic_checks: Check =
Check(CheckLevel.Error, s"date_start distinct values should match number of days between 2018-01-01 and $runDate")
.hasSize(assert_func,
Some(s"date_start distinct values should match number of days between 2018-01-01 and $runDate")
)
But check fails.
Now, If I just add a hard coded value in place of expected_count the check passes.
val assert_func = (size:Long) => size == 1949
val basic_checks: Check =
Check(CheckLevel.Error, s"date_start distinct values should match number of days between 2018-01-01 and $runDate")
.hasSize(assert_func,
Some(s"date_start distinct values should match number of days between 2018-01-01 and $runDate")
)
Not sure why value of expected_count is not getting resolved here. deequ check hasSize is as follows: https://github.com/awslabs/deequ/blob/ea52006fa7c8754459afedeec65ebae6c0074018/src/main/scala/com/amazon/deequ/checks/Check.scala#L112