Use primitive_options on Featuretools to calc feature_matrix

110 views Asked by At

I have a dataset with more than 30.000 rows like the picture below and need to generate some features with the featuretools library.

Blockquote

import pandas as pd
import featuretools as ft

# Read in the full dataset
df_data = pd.read_csv('dataset.csv')

es = ft.EntitySet('time_series')

es = es.add_dataframe(dataframe_name='app',dataframe = df_data)# Add entity from table df_data

rol_Average_primitive = ft.primitives.RollingMean(window_length=300,
                                     gap=200,
                                     min_periods=300)

rol_Standard_deviation_primitive = ft.primitives.RollingSTD(window_length=300,
                                     gap=100,
                                     min_periods=300)


feature_matrix, feature_defs = ft.dfs(entityset=es,
               target_dataframe_name='app',
               trans_primitives = ([rol_Average_primitive, rol_Standard_deviation_primitive, ft.primitives.CumMean()]),
               **#primitive_options={rol_Average_primitive: {"include_variables": {"app": ["A"]}}},**                                  
               cutoff_time=pd.Timestamp('2022-02-15')
                                      
              )

    feature_defs
    
    print('Saving features')
    
    feature_matrix.to_csv('feature_matrix.csv')

if I try to run the code I get this error message:

UnusedPrimitiveWarning: Some specified primitives were not used during DFS: trans_primitives: ['rolling_mean', 'rolling_std'] This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible columns for the primitive were found in the data.

I tried to use the primitive_options as like in the bold line in code but don't works, give the error: TypeError: sequence item 0: expected str instance, RollingMean found

Can someone help please? thanks in advance for any help.

1

There are 1 answers

0
dvreed77 On

I think your issue is that you are not setting a time_index when you add your dataframe. Here is a distilled version of your problem:

import pandas as pd
import numpy as np
import featuretools as ft

# create some fake data
df = pd.DataFrame({
    'idx': np.arange(10000),
    "nums": [1,2,3,4,5,6,7,8,9,10]*1000,
    "times": pd.date_range(start='2020-01-01', freq='1h', periods=10000)
})

es = ft.EntitySet('test_es')

es = es.add_dataframe(
    dataframe_name='data',
    dataframe = df,
    #time_index="times"
)

rol_Average_primitive = ft.primitives.RollingMean(
    window_length=100,
    gap=1,
    min_periods=100
)


feature_defs = ft.dfs(
    entityset=es,
    target_dataframe_name='data',
    trans_primitives = ([
        rol_Average_primitive, 
    ]),
    features_only=True                              
)

feature_defs

This will give your same error. If you uncomment the time_index line, it should work.