I have a very simple question, is using sum
or matrix multiplication faster to sum a large vector? More precisely, here an example of problem I am trying to speed up:
d <- 1000
X <- matrix(rnorm(d^2), nrow = d)
y <- rnorm(d)
## Solution 1
sum(X%*%y)
## Solution 2
rep(1, d)%*%(X%*%y)
I have tried testing the two with system.time()
, but the times jump around each other and I can't get a fix on it. The times are very similar so this question has passed from practical to just inquisitive. Perhaps they are exactly the same time (seems unlikely).
Here's the function I've written to test it:
testSum <- function(d, its){
X <- matrix(rnorm(d^2), nrow=d)
y <- rnorm(d)
store <- matrix(NA, nrow = its, ncol = 3)
store2 <- matrix(NA, nrow = its, ncol = 3)
for(i in 1:its) store[i, ] <- system.time(sum(X%*%y))[1:3]
for(i in 1:its) store2[i, ] <- system.time(rep(1, d)%*%(X%*%y))[1:3]
return(list(sumF = mean(store[, 1]),
MM = mean(store2[, 1])))
}
testSum(1000, 100)
And the output always looks something like this:
$sumF
[1] 0.01021
$MM
[1] 0.01028
Where the top is using sum and the bottom is using matrix multiplication. Any hints, suggestions are welcome! Thanks!
One simple thing to try is using a larger vector.
Using a million.
Using 10 million.
On my machine sum is ~4 times faster.
NOTE: I precomputed the vectors instead of multiplying vector and matrix to get a vector. Also precomputed the vector of ones to make the comparison more fair.