GreatExpectations to skip footer of S3 file

23 views Asked by At

Is there any way to skip footer of a S3 file before running the expectations?

I have an example, which works fine, but breaks when files have a footer. To avoid that, I want to ignore last X lines but not sure if this is supported.

validation_config = {
    "EXPECTATION_SUITE_NAME": "",
    "HTML_RESULT_S3_BUCKET": "",
    "HTML_RESULT_S3_PREFIX": "",
    "JSON_RESULT_S3_BUCKET": "",
    "JSON_RESULT_S3_PREFIX": "",
    "VALIDATION_EMAIL_RECIPIENTS": [""],
}

data_context = DataContext(runtime_environment={"S3_PATH": "", "ASSET_NAME": "", **validation_config})

checkpoint = data_context.get_checkpoint(name="default_checkpoint")
result = checkpoint.run()

The footer of the files I have in S3 breaks the expectations so before running validations, I would like to skip last X rows.

Is there any way to do this?

Example of my json file:

{
"data_asset_type": null,
"expectation_suite_name": "my_suite_name",
"expectations": [
  {
    "expectation_type": "expect_column_to_exist",
    "kwargs": {
      "column": "column1"
    },
    "meta": {}
  },
  {
    "expectation_type": "expect_column_to_exist",
    "kwargs": {
      "column": "column2"
    },
    "meta": {}
  },
  {
    "expectation_type": "expect_column_values_to_be_in_set",
    "kwargs": {
      "column": "column1",
      "value_set": [
        "Yes",
        "No"
      ]
    },
    "meta": {}
  }
],
"ge_cloud_id": null,
"meta": {
  "great_expectations_version": "0.15.48"
}
}

And let's say the file is something like this:

column1,column2
Yes,description
No,description

My_Footer,2023
0

There are 0 answers