I am trying some experiments using the Remaining Useful Life prediction example on the Turbofan Engine Degradation Simulation Data Set from NASA. I want to use a small number of data points before the cut-off time to create features and for that I am trying to use the training_window="50m" parameter in the featuretools.dfs function. This value is valid because I have generated a time column for the dataframe with frequency=600s. That means my training window should select 5 values for each cut-off time to create features. However, using the parameter returns an empty feature matrix and so far I have not been able to figure out the reason. I am using the same code as given in this notebook with some additional changes that are:
- I used CidCe primitive from the advanced notebook.
- I used the following piece of code to search for labels which selects duplicate entries as well
label_times = lm.search(
data.sort_values('time'),
num_examples_per_instance=5,
minimum_data=100,
drop_empty=False,
gap = 10,
verbose=True,
)
The mistake was mine. It was written somewhere in the documentation that I need to add the list time indices myself but in my defense, I never got a warning related to this contrary to what was mentioned in the documentation. I fixed it by using es.add_last_time_indexes().