Iterating conditional sums in R

Question

Iterating conditional sums in R

159 views Asked by Vincent At 23 December 2016 at 03:26

I have a series of two-dimensional numerical matrices comprising 1s and 0s. (So I suppose they can also be seen as logical arrays.) What I want to be able to do for such arrays is to generate a vector the length of one dimension of the array (the number of columns). It would contain, for every column in the array, the sum of row totals for the rows where the entry is 1.

Here's what I have for single columns:

#Generate sample data
dataset<-matrix(sample(0:1, size=190, replace=TRUE), nrow=19, ncol=10)
#Calculate row sums
scores<-rowSums(dataset)
#calculate desired statistic for column 1
M1_1 <- sum(scores[which (dataset[,1]==1)])
#calculate same statistic for column 2
M1_2 <- sum(scores[which (dataset[,2]==1)])

Obviously, instead of writing M1_1, M1_2, ..., M1_n, I want to define M1_X to iterate through every column. I suspect it's a really simple thing to do, but haven't been able to figure out how to do it. Any guidance would be appreciated.

Original Q&A

There are 3 answers

Sandipan Dey On 23 December 2016 at 08:00

Matrix multiplication will also work (reproducible with seed 123):

as.numeric(matrix(scores, nrow=1) %*% dataset)
# [1] 53 72 16 51 43 49 51 49 30 66

Ronak Shah On 23 December 2016 at 04:01

We can just multiply the matrix of 0's and 1's with the corresponding scores and then get the sum column-wise

colSums(dataset * scores)

#[1] 44 58 50 53 42 60 43 46 55 45

**akrun** · Accepted Answer · 2016-12-23T03:43:48+00:00

We can loop with sapply and get the sum

as.vector(sapply(split(dataset, col(dataset)), function(x) sum(scores[x==1])))
#[1] 56 47 50 53 55 48 75 67 40 55

Or using apply

apply(dataset, 2, function(x) sum(scores[x==1]))
#[1] 56 47 50 53 55 48 75 67 40 55

Or a vectorized approach would be to replicate the 'scores' and multiply it with 'dataset' without making use of any recycling (which can be dangerous at times)

colSums(scores[row(dataset)]*dataset)
#[1] 56 47 50 53 55 48 75 67 40 55

Or another intuitive option is sweep

colSums(sweep(dataset, 1, scores, FUN = "*"))
#[1] 56 47 50 53 55 48 75 67 40 55

Based on OP's post,

M1_1
#[1] 56
M1_2
#[1] 47

Or as @user20650 commented, a concise option is crossprod

crossprod(scores, dataset)

Or without even calculating 'scores' in a different step

rowSums(crossprod(dataset))
#[1] 56 47 50 53 55 48 75 67 40 55

TechQA.

Iterating conditional sums in R

There are 3 answers

Related Questions in ARRAYS

Related Questions in R

Related Questions in SUMIFS

Popular Questions

Popular Tags

Trending Questions