I'm trying to run the sample code for pattern check "hasPattern()" with PyDeequ and it fails with Exception
The code:
import pydeequ
from pyspark.sql import SparkSession, Row
spark = (SparkSession
.builder
.config("spark.jars.packages", pydeequ.deequ_maven_coord)
.config("spark.jars.excludes", pydeequ.f2j_maven_coord)
.getOrCreate())
df = spark.sparkContext.parallelize([
Row(a="foo", creditCard="5130566665286573", email="[email protected]", ssn="123-45-6789",
URL="http://[email protected]:8080"),
Row(a="bar", creditCard="4532677117740914", email="[email protected]", ssn="123456789",
URL="http://example.com/(something)?after=parens"),
Row(a="baz", creditCard="3401453245217421", email="[email protected]", ssn="000-00-0000",
URL="http://[email protected]:8080")]).toDF()
from pydeequ.checks import *
from pydeequ.verification import *
check = Check(spark, CheckLevel.Error, "Integrity checks")
checkResult = VerificationSuite(spark) \
.onData(df) \
.addCheck(
check.hasPattern(column='email',
pattern=r".*@baz.com",
assertion=lambda x: x == 1 / 3)) \
.run()
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
checkResult_df.show()
After run I recieve:
AttributeError: 'NoneType' object has no attribute '_Check'
on line
check.hasPattern(column='email',
pattern=r".*@baz.com",
assertion=lambda x: x == 1 / 3)
PyDeequ version: 1.0.1 Python version: Python 3.7.9
At this moment in time, it appears that the code on the pydeequ repository doesn't actually have the function definition fully fleshed out. It has a docstring that indicates the desired behavior, but it does not seem to have any accompanying code to do the actual work.
Without any code to do this test, the function will always return a value of
None
(the default behavior for Python functions).The correct expected behavior for the check methods in pydeequ is to return the
check
object (represented by the self parameter), which will allow the user to daisy chain multiple checks in a sequence.For comparison, I provide a snippet of code from the
hasPattern
(which is not fully coded and only contains the docstring) method and thecontainsCreditCardNumber
method which appears to be fully implemented.hasPattern
containsCreditCardNumber