I was using sklearn
's Isolation Forest and using that to predict on other samples (drawn from the same population). Due to the large size of the samples, I cannot fit the Forest on the whole data (because I cannot convert that data into pandas dataframe), hence this approach.
As it was taking time, I was thinking about reducing the maximum depth of the trees. According to sklearn
's documentation, "
The maximum depth of each tree is set to ceil(log_2(n)) where is the number of samples used to build the tree (see (Liu et al., 2008) for more details)
"
But as I want only the topmost anomalies, I think I can limit the depth of the trees to something even less than ceil(log_2(n)). And I think that might reduce the time taken to fit and make predictions using that model. But sklearn
does not support the parameter max_depth
. Is there a way to specify the max_depth
in sklearn