I'm working on the featuretools package.I have a question about functions "remove_highly_correlated_features".In the documentation:"We make the assumption that, for a pair of features, the feature that is further right in the feature matrix produced by dfs is the more complex one". In the dfs, How to calculate the complexity of the features and then sort them? Why the features on the right are more complex? Thanks
there is a question about remove_highly_correlated_features
88 views Asked by liyang At
1
remove_highly_correlated_features
treats the depth of a feature (feature.get_depth()
) as a rudimentary measure of complexity, the idea being that features created by stacking primitives on top of each other are more complex. The features DFS outputs are sorted by ascending depth.This is only an approximation of complexity and there could be scenarios where a user would choose a different feature to select