In precision medicine, decision tree have been used to partition patients into different subgroups who might have similar response to the treatment (similar treatment effect). Under this occasion, it is crucial to find out a stable tree structure to decide which patient should or should not been treated.
However, as we known, decision based on a single tree is un-robust, since the structure of a single tree is variant. Although using ensemble algorithms (build many trees and average them) can improve the prediction precision, it can become unclear to decide which tree should be chosen.
Therefore, in the ensemble algorithms, such as random forest, we want to calculate the similarity/distance of a tree when compared to the others, and find out the most reliable and represent one for decision making.
So, we want to ask whether there is some reliable theory and code to support this.
Distances between different trees is subjective to your interpretation. Having said that, I think you can use the following:
One of the major difference between Decision Tree and Random Forest is , RF takes a subset of features every time it creates a ensemble of trees. And the classification decision is an average, which in most cases results into a better decision making. Hence, we consider a RF going forward.
So if your dataset is having
nrows, samplemdata points at random wherem < n. Sample for say100times and train/test RF on each sample. You can then average out the accuracy/F1 score and see the performance. Another way is to doStratifiedKFoldtest.If you plot the accuracy distribution, if you see a gaussian distribution, you can more or less say your prediction will be consistent/reliable.