Is there any way to skip footer of a S3 file before running the expectations?
I have an example, which works fine, but breaks when files have a footer. To avoid that, I want to ignore last X lines but not sure if this is supported.
validation_config = {
"EXPECTATION_SUITE_NAME": "",
"HTML_RESULT_S3_BUCKET": "",
"HTML_RESULT_S3_PREFIX": "",
"JSON_RESULT_S3_BUCKET": "",
"JSON_RESULT_S3_PREFIX": "",
"VALIDATION_EMAIL_RECIPIENTS": [""],
}
data_context = DataContext(runtime_environment={"S3_PATH": "", "ASSET_NAME": "", **validation_config})
checkpoint = data_context.get_checkpoint(name="default_checkpoint")
result = checkpoint.run()
The footer of the files I have in S3 breaks the expectations so before running validations, I would like to skip last X rows.
Is there any way to do this?
Example of my json file:
{
"data_asset_type": null,
"expectation_suite_name": "my_suite_name",
"expectations": [
{
"expectation_type": "expect_column_to_exist",
"kwargs": {
"column": "column1"
},
"meta": {}
},
{
"expectation_type": "expect_column_to_exist",
"kwargs": {
"column": "column2"
},
"meta": {}
},
{
"expectation_type": "expect_column_values_to_be_in_set",
"kwargs": {
"column": "column1",
"value_set": [
"Yes",
"No"
]
},
"meta": {}
}
],
"ge_cloud_id": null,
"meta": {
"great_expectations_version": "0.15.48"
}
}
And let's say the file is something like this:
column1,column2
Yes,description
No,description
My_Footer,2023