Pyspark version of Amazon Deequ

2.8k views Asked by At

I am working on AWS Glue and leveraging pyspark API for my ETL. I believe if I need to use Amazon Deequ I need to switch to Scala. However I still want contine to use Pyspark APIs. Is there a way out? If yes what are the steps I need to follow in AWS Glue?

Thanks

1

There are 1 answers

3
Alex Ott On BEST ANSWER

There is a Python wrapper for Deequ, called PyDeequ, it should work, although I haven't used it myself.

If you want to use Python, I would recommend to look to the Great Expectations library that implements functionality quite similar to the Deequ, including support for PySpark.