By what criteria should bootstrap parameter be selected in isolation forest?

395 views Asked by At

As you know, Isolation forest model in scikit-learn has a parameter, bootstrap. The description is like below.

If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.

I made a simple data and trained a isolation forest model. But the evaluation results were quite different whether bootstrap = True or False. Please refer to below codes.

import numpy as np
from sklearn.ensemble import IsolationForest

np.random.seed(0)

# making train and test data
size = 10
train_x = np.concatenate( (np.random.uniform(0,1,size=(size,1)), np.array([[100]]) ), axis=0, )
train_y = [1]*size + [-1]
test_x = np.concatenate((np.random.uniform(0,1,size = (size,1)), np.array([[102]])), axis=0)
test_y = train_y.copy()

# defining accuracy
def accuracy(y_true, y_pred):
    return sum(1 for i in range(len(y_true)) if y_true[i] == y_pred[i] ) / len(y_true)

# when bootstrap = True
iso = IsolationForest(n_estimators = 100, max_samples= 4, max_features = 1.0, bootstrap = True, random_state= 0)
iso.fit(train_x)
predicted_y = iso.predict(test_x)
print(accuracy(test_y, predicted_y)) # 0.8182

# when bootstrap = False
iso = IsolationForest(n_estimators = 100, max_samples= 4, max_features = 1.0, bootstrap = False, random_state= 0)
iso.fit(train_x)
predicted_y = iso.predict(test_x)
print(accuracy(test_y, predicted_y)) # 1.0

My question is,

  1. What is the role of bootstrap parameter in isolation forest?
  2. By what criteria should bootstrap parameter be selected in isolation forest?

Please let me know when to select True and when to select False.

1

There are 1 answers

0
NotAName On

If bootstrap is set as False then you essentially create a number of identical decision trees containing the entire training dataset.

The entire premise of the Random Forest style models is that a bootstrap sample (i.e. with replacement) is taken from the dataset for each of the trees and this allows the model to generalise much better than a decision tree can.

Long story short, if you want a Forest to be a proper Forest, bootstrap should always be set to True.