Why the height of dendrogram based on correlation matrix is higher than 1?

53 views Asked by Alex Z At 29 July 2023 at 07:31

I am new to python. While I am removing multicollinear featrues by calculating the correlation matrix between different features in a same way as https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html#permutation-importance-with-multicollinear-or-correlated-features with a more specific code is presented as below:

from scipy.stats import spearmanr
from scipy.spatial.distance import squareform
from collections import defaultdict
from scipy.cluster.hierarchy import linkage, fcluster,distance

corr = spearmanr(data_all[feature_name]).correlation

# Ensure the correlation matrix is symmetric
corr = (corr + corr.T) / 2
np.fill_diagonal(corr, 1)

# We convert the correlation matrix to a distance matrix before performing
# hierarchical clustering using Ward's linkage.
distance_matrix = 1 - np.abs(corr)
y = distance.squareform(distance_matrix)
dist_linkage = linkage(y,method='ward')

I got a result as below:

enter image description here

The height of the dendrogram is higher than 1, the maximum value of y. Is this result reasonable? If so, how is the height defined in hierarchy.linkage?

I did double check to make sure that the maximum value of y is smaller than 1. As I understand that whether methods for linkage, that is minimum, maximum, average, or square error ("ward"), the distance based on input y shall also be smaller than 1. Besides, the results of the manual of sklearn https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html#permutation-importance-with-multicollinear-or-correlated-features also gives a result higher than 1. I am pretty confused on the results.

Original Q&A

TechQA.

Why the height of dendrogram based on correlation matrix is higher than 1?

There are 0 answers

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in HIERARCHY

Related Questions in DENDROGRAM

Related Questions in PEARSON-CORRELATION

Popular Questions

Trending Questions