Efficiently compute the row sums of a 3d array in R

Question

Efficiently compute the row sums of a 3d array in R

6.5k views Asked by Gavin Simpson At 27 February 2011 at 19:36

Consider the array a:

> a <- array(c(1:9, 1:9), c(3,3,2))
> a
, , 1

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

, , 2

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

How do we efficiently compute the row sums of the matrices indexed by the third dimension, such that the result is:

     [,1] [,2]
[1,]   12   12
[2,]   15   15
[3,]   18   18

??

The column sums are easy via the 'dims' argument of colSums():

> colSums(a, dims = 1)

but I cannot find a way to use rowSums() on the array to achieve the desired result, as it has a different interpretation of 'dims' to that of colSums().

It is simple to compute the desired row sums using:

> apply(a, 3, rowSums)
     [,1] [,2]
[1,]   12   12
[2,]   15   15
[3,]   18   18

but that is just hiding the loop. Are there other efficient, truly vectorised, ways of computing the required row sums?

Original Q&A

There are 4 answers

Spacedman On 27 February 2011 at 20:40

If you have a multi-core system you could write a simple C function and make use of the Open MP parallel threading library. I've done something similar for a problem of mine and I get an 8 fold increase on an 8 core system. The code will still work on a single-processor system and even compile on a system without OpenMP, perhaps with a smattering of #ifdef _OPENMP here and there.

Of course its only worth doing if you know that's what's taking most of the time. Do profile your code before optimising.

Fojtasek On 27 February 2011 at 20:49

You could chop up the array into two dimensions, compute row sums on that, and then put the output back together the way you want it. Like so:

rowSums3d <- function(a){
    m <- matrix(a,ncol=ncol(a))
    rs <- rowSums(m)
    matrix(rs,ncol=2)
}

> a <- array(c(1:250000, 1:250000),c(5000,5000,2))
> system.time(rowSums3d(a))
   user  system elapsed 
   1.73    0.17    1.96 
> system.time(apply(a, 3, rowSums))
   user  system elapsed 
   3.09    0.46    3.74

Tony Breyal On 27 February 2011 at 21:14

I don't know about the most efficient way of doing this, but sapply seems to do well

a <- array(c(1:9, 1:9), c(3,3,2))
x1 <- sapply(1:dim(a)[3], function(i) rowSums(a[,,i]))
x1
     [,1] [,2]
[1,]   12   12
[2,]   15   15
[3,]   18   18

x2 <- apply(a, 3, rowSums)
all.equal(x1, x2)
[1] TRUE

Which gives a speed improvement as follows:

> a <- array(c(1:250000, 1:250000),c(5000,5000,2))

> summary(replicate(10, system.time(rowSums3d(a))[3]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.784   2.799   2.810   2.814   2.821   2.862 

> summary(replicate(10, system.time(apply(a, 3, rowSums))[3]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.730   2.755   2.766   2.776   2.788   2.839 

> summary(replicate(10, system.time( sapply(1:dim(a)[3], function(i) rowSums(a[,,i])) )[3]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.840   1.852   1.867   1.872   1.893   1.914

Timings were done on:

# Ubuntu 10.10
# Kernal Linux 2.6.35-27-generic
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

**Gavin Simpson** · Accepted Answer · 2011-02-27T22:53:55+00:00

@Fojtasek's answer mentioned splitting up the array reminded me of the aperm() function which allows one to permute the dimensions of an array. As colSums() works, we can swap the first two dimensions using aperm() and run colSums() on the output.

> colSums(aperm(a, c(2,1,3)))
     [,1] [,2]
[1,]   12   12
[2,]   15   15
[3,]   18   18

Some comparison timings of this and the other suggested R-based answers:

> b <- array(c(1:250000, 1:250000),c(5000,5000,2))
> system.time(rs1 <- apply(b, 3, rowSums))
   user  system elapsed 
  1.831   0.394   2.232 
> system.time(rs2 <- rowSums3d(b))
   user  system elapsed 
  1.134   0.183   1.320 
> system.time(rs3 <- sapply(1:dim(b)[3], function(i) rowSums(b[,,i])))
   user  system elapsed 
  1.556   0.073   1.636
> system.time(rs4 <- colSums(aperm(b, c(2,1,3))))
   user  system elapsed 
  0.860   0.103   0.966

So on my system the aperm() solution appears marginally faster:

> sessionInfo()
R version 2.12.1 Patched (2011-02-06 r54249)
Platform: x86_64-unknown-linux-gnu (64-bit)

However, rowSums3d() doesn't give the same answers as the other solutions:

> all.equal(rs1, rs2)
[1] "Mean relative difference: 0.01999992"
> all.equal(rs1, rs3)
[1] TRUE
> all.equal(rs1, rs4)
[1] TRUE

TechQA.

Efficiently compute the row sums of a 3d array in R

There are 4 answers

Related Questions in ARRAYS

Related Questions in R

Related Questions in ROWSUM

Popular Questions

Popular Tags

Trending Questions