I have a dataset with more than 30.000 rows like the picture below and need to generate some features with the featuretools library.
import pandas as pd
import featuretools as ft
# Read in the full dataset
df_data = pd.read_csv('dataset.csv')
es = ft.EntitySet('time_series')
es = es.add_dataframe(dataframe_name='app',dataframe = df_data)# Add entity from table df_data
rol_Average_primitive = ft.primitives.RollingMean(window_length=300,
gap=200,
min_periods=300)
rol_Standard_deviation_primitive = ft.primitives.RollingSTD(window_length=300,
gap=100,
min_periods=300)
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_dataframe_name='app',
trans_primitives = ([rol_Average_primitive, rol_Standard_deviation_primitive, ft.primitives.CumMean()]),
**#primitive_options={rol_Average_primitive: {"include_variables": {"app": ["A"]}}},**
cutoff_time=pd.Timestamp('2022-02-15')
)
feature_defs
print('Saving features')
feature_matrix.to_csv('feature_matrix.csv')
if I try to run the code I get this error message:
UnusedPrimitiveWarning: Some specified primitives were not used during DFS: trans_primitives: ['rolling_mean', 'rolling_std'] This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible columns for the primitive were found in the data.
I tried to use the primitive_options as like in the bold line in code but don't works, give the error: TypeError: sequence item 0: expected str instance, RollingMean found
Can someone help please? thanks in advance for any help.
I think your issue is that you are not setting a
time_index
when you add your dataframe. Here is a distilled version of your problem:This will give your same error. If you uncomment the
time_index
line, it should work.