I am working on AWS Glue and leveraging pyspark API for my ETL. I believe if I need to use Amazon Deequ I need to switch to Scala. However I still want contine to use Pyspark APIs. Is there a way out? If yes what are the steps I need to follow in AWS Glue?
Thanks
There is a Python wrapper for Deequ, called PyDeequ, it should work, although I haven't used it myself.
If you want to use Python, I would recommend to look to the Great Expectations library that implements functionality quite similar to the Deequ, including support for PySpark.