How to Use Python to Identify and Report Bad Data Instances?

101 views Asked by At

This is a general question on if anyone is aware of a library like sklearn which has a function to read data and report back any strange behaviors or quality concerns within the data after getting user input specifying the type of data such as:

  • Flat values for an extended period of time (i.e. variance for last N time-series records dropping to 0 suddenly)
  • Sudden jumping of data (Value cliff-dropping to 0, and jumping back up to normal, or extremely high rate of change)
  • And so on...

Example (Good):

Blockquote

(Bad - Dropping to 0):

enter image description here

(Bad - Flat/constant value when non-constant is expected)

enter image description here

If such a library already exists, I would appreciate if someone could refer me the name so I can avoid "re-inventing the wheel" and see what other analysis methods there might be that I have not thought of to check for.

0

There are 0 answers