r corrplot with clustering: default dissimilarity measure for correlation matrix

2.9k views Asked by At

I used the R package corrplot to visualize the correlation matrix from my data. I involved the clustering of variables using the embedded option hclust. The invocation of the command was like this (plus various arrangements of titles, axes etc):

corrplot(Rbas,type="upper",order="hclust",method="ellipse")

But now I perform some analysis and visualizations using other packages, and the question arose about the compatibility of results. In particular, I have to repeat manually the clustering of the correlation matrix. But from the documentation to corrplot there is one obscure point: what dissimilarity measure was used in corrplot behind its reasonable defaults? Whether this is 1-|corr|, sqrt(1-corr^2), or anything else? In literature there are multiple choices, for example, as described in this article

Update to answer own question. I performed a guess trial, using the dissimilarity measure in the form 1-corr. That is I coded (Rbas is the correlation matrix):

dissim1<-1-Rbas
dist1<-as.dist(dissim1)
plot(hclust(dist1))

and recovered the ordering of variables, coinciding with the one suggested by default corrplot with hclust invocation. But it is not clear whether this is indeed their used mechanism and whether this will hold for any other matrix?

1

There are 1 answers

0
Marco Sandri On BEST ANSWER

The function used by corrplot to reorder variables is corrMatOrder (try ?corrMatOrder).
It returns a single permutation vector.
When order= "hclust" is selected in corrplot, corrMatOrder invokes the corrplot:::reorder_using_hclust function:

function (corr, hclust.method) 
{
    hc <- hclust(as.dist(1 - corr), method = hclust.method)
    order.dendrogram(as.dendrogram(hc))
}

This function uses 1-corr as dissimilarity measure.