Is sum or matrix multiplication faster?

1.2k views Asked by At

I have a very simple question, is using sum or matrix multiplication faster to sum a large vector? More precisely, here an example of problem I am trying to speed up:

d <- 1000
X <- matrix(rnorm(d^2), nrow = d)
y <- rnorm(d)

## Solution 1
sum(X%*%y)
## Solution 2
rep(1, d)%*%(X%*%y)

I have tried testing the two with system.time(), but the times jump around each other and I can't get a fix on it. The times are very similar so this question has passed from practical to just inquisitive. Perhaps they are exactly the same time (seems unlikely).

Here's the function I've written to test it:

testSum <- function(d, its){
  X <- matrix(rnorm(d^2), nrow=d)
  y <- rnorm(d)

  store <- matrix(NA, nrow = its, ncol = 3)
  store2 <- matrix(NA, nrow = its, ncol = 3)

  for(i in 1:its) store[i, ] <- system.time(sum(X%*%y))[1:3]
  for(i in 1:its) store2[i, ] <- system.time(rep(1, d)%*%(X%*%y))[1:3]

  return(list(sumF = mean(store[, 1]),
              MM = mean(store2[, 1])))
}

testSum(1000, 100)

And the output always looks something like this:

$sumF
[1] 0.01021

$MM
[1] 0.01028

Where the top is using sum and the bottom is using matrix multiplication. Any hints, suggestions are welcome! Thanks!

2

There are 2 answers

2
Karolis Koncevičius On BEST ANSWER

One simple thing to try is using a larger vector.

Using a million.

library(microbenchmark)

A <- rnorm(1000000)
B <- rep(1, 1000000)

system.time(sum(A))
user  system elapsed
0.012   0.000   0.01

system.time(B %*% A)
user  system elapsed
0.044   0.000   0.04

microbenchmark(sum(A), B%*%A)
Unit: microseconds
   expr      min       lq     mean   median       uq      max neval
 sum(A)  899.735  953.361 1021.005 1021.415 1081.348 1885.857   100
B %*% A 2846.589 2946.579 3063.001 2993.341 3132.779 4498.773   100

Using 10 million.

library(microbenchmark)

A <- rnorm(10000000)
B <- rep(1, 10000000)

system.time(sum(A))
user  system elapsed
0.012   0.000   0.011

system.time(B %*% A)
user  system elapsed
0.044   0.000   0.044

microbenchmark(sum(A), B%*%A)
Unit: milliseconds
   expr       min        lq      mean    median       uq      max neval
 sum(A)  8.993579  9.294605  9.975156  9.729226 10.22477 14.29411   100
B %*% A 32.716323 33.818031 35.586381 35.966695 36.86165 41.13194   100

On my machine sum is ~4 times faster.

NOTE: I precomputed the vectors instead of multiplying vector and matrix to get a vector. Also precomputed the vector of ones to make the comparison more fair.

0
jeremycg On

You might be interested in the microbenchmark package, an easy way to time small functions a lot of times:

microbenchmark::microbenchmark(sum(X%*%y),rep(1, d)%*%(X%*%y))

gives me:

Unit: milliseconds
                    expr      min       lq     mean   median       uq      max neval
            sum(X %*% y) 10.01472 10.52420 14.25944 11.11969 13.67134 74.26345   100
 rep(1, d) %*% (X %*% y) 10.13382 10.55444 12.99910 10.87629 12.95769 50.38268   100

So on my (slow) laptop, they are more or less the same.