How do you convert a dataframe to a great_expectations dataset?

4.8k views Asked by At

I have a pandas or pyspark dataframe df where I want to run an expectation against. I already have my dataframe in memory. How can I convert my dataframe to a great_expectations dataset?

so that i can do for example:

df.expect_column_to_exist("my_column")
2

There are 2 answers

0
Vincent Claes On BEST ANSWER
import great_expectations as ge

for pandas:

df_ge = ge.from_pandas(df)

or

df_ge = ge.dataset.PandasDataset(df)

for pyspark:

df_ge = ge.dataset.SparkDFDataset(df)

now you can run your expectation

df_ge.expect_column_to_exist("my_column")

Note that the great_expectations SparkDFDataset does not inherit the functions from the pyspark DataFrame. You can access the original pyspark DataFrame by df_ge.spark_df

0
spbail On

See also the Great Expectations documentation/tutorial for an alternative version on converting a Pandas DF using ge.from_pandas: https://docs.greatexpectations.io/en/latest/guides/tutorials/explore_expectations_in_a_notebook.html