Every row of my dataframe contain a record with a unique key combination. The data validation will be based on the columns and on key combination. For example, in a single column, cells may have a different min/max requirement based on the key combination.
Several questions:
- can Pandera validate on a cell basis as opposed to column basis ?
- does Pandera have a schema generator capable of this type of flexibility. Perhaps it scans a "golden dataframe" as a starting place to create a schema based on some provided criteria. I realize the schema generator output may need a bit of tweaking.
The library does look cool, and I am interested to pursue further.
thanks
so you can create a validator that validates a single value at a time with the
element_size=Truekwarg, you can read more here.The function must take an individual value as input and output a boolean.
Can you elaborate on the exact check that you want to perform? If you want to do a dataframe-level row-wise check you can use an element-wise check at the dataframe-level as a wide check.
You can use the
schema = pandera.infer_schema(golden_dataframe)function to bootstrap a starter schema, then write it out to a file withschema.to_script("path/to/file")to further iterate.