I am using opendata, and sometimes the shape of the datasets changes : some rows are missing for example. I would like to be warned when it happens.
I figured this out :
import pandas as pd
old_data = [["apple",1,"red"], ["banana", 5, "yellow"], ["peach", 3, "pink"]]
new_data = [["apple",1,"red"], ["banana", 5, "yellow"], ["peach", 3, "pink"],
["orange", 7, "orange"]]
old_dataset = pd.DataFrame(old_data, columns = ["fruit", "nb", "colour"])
new_dataset = pd.DataFrame(new_data, columns = ["fruit", "nb", "colour"])
compare_lengths = len(old_dataset) == len(new_dataset)
if compare_lengths == False:
lengths_warning = "Dataset's length changed"
if compare_lengths == True:
lengths_warning = "OK"
Then I would gather all the warnings in a file, where I could see them all. I would do this by importing every variable like this.
from file_1 import lengths_warning
Is there a better way to do it ?
Is it possible to set up warnings/controls in a more efficient way ? With a specific module maybe ?
Thks