Is Pearson correlation faster than Spearman correlation in R?

70 views Asked by At

I would like to determine many correlations (millions) between pairs of columns, so I am worried about computing time.

I suspect that Pearson correlations (based on values) are faster to calculate in R than Spearman correlations (based on ranks). Is that correct?

How can I find out, please? Thank you.

1

There are 1 answers

3
Till On BEST ANSWER

You can use the rbenchmark package for this.

library(rbenchmark)

1.000 rows, 100 repetitions

x1 <- rnorm(1000)
y1 <- rnorm(1000)

benchmark(spearman = {
  cor(x1, y1, method = "spearman")
},
pearson = {
  cor(x1, y1, method = "pearson")
},
replications = 100)
#>       test replications elapsed relative user.self sys.self user.child
#> 2  pearson          100   0.002        1     0.002        0          0
#> 1 spearman          100   0.014        7     0.013        0          0
#>   sys.child
#> 2         0
#> 1         0

1.000.000 rows, 100 repititions

x2 <- rnorm(1000000)
y2 <- rnorm(1000000)

benchmark(spearman = {
  cor(x2, y2, method = "spearman")
},
pearson = {
  cor(x2, y2, method = "pearson")
},
replications = 100)
#>       test replications elapsed relative user.self sys.self user.child
#> 2  pearson          100   0.717    1.000     0.717    0.001          0
#> 1 spearman          100  37.336   52.073    36.797    0.537          0
#>   sys.child
#> 2         0
#> 1         0

This confirms you assumption: Pearson is significantly faster than Spearman. Especially when the rows/cases are increased, Spearman becomes slow.