Iterating conditional sums in R

153 views Asked by At

I have a series of two-dimensional numerical matrices comprising 1s and 0s. (So I suppose they can also be seen as logical arrays.) What I want to be able to do for such arrays is to generate a vector the length of one dimension of the array (the number of columns). It would contain, for every column in the array, the sum of row totals for the rows where the entry is 1.

Here's what I have for single columns:

#Generate sample data
dataset<-matrix(sample(0:1, size=190, replace=TRUE), nrow=19, ncol=10)
#Calculate row sums
scores<-rowSums(dataset)
#calculate desired statistic for column 1
M1_1 <- sum(scores[which (dataset[,1]==1)])
#calculate same statistic for column 2
M1_2 <- sum(scores[which (dataset[,2]==1)])

Obviously, instead of writing M1_1, M1_2, ..., M1_n, I want to define M1_X to iterate through every column. I suspect it's a really simple thing to do, but haven't been able to figure out how to do it. Any guidance would be appreciated.

3

There are 3 answers

0
akrun On BEST ANSWER

We can loop with sapply and get the sum

as.vector(sapply(split(dataset, col(dataset)), function(x) sum(scores[x==1])))
#[1] 56 47 50 53 55 48 75 67 40 55

Or using apply

apply(dataset, 2, function(x) sum(scores[x==1]))
#[1] 56 47 50 53 55 48 75 67 40 55

Or a vectorized approach would be to replicate the 'scores' and multiply it with 'dataset' without making use of any recycling (which can be dangerous at times)

colSums(scores[row(dataset)]*dataset)
#[1] 56 47 50 53 55 48 75 67 40 55

Or another intuitive option is sweep

colSums(sweep(dataset, 1, scores, FUN = "*"))
#[1] 56 47 50 53 55 48 75 67 40 55

Based on OP's post,

M1_1
#[1] 56
M1_2
#[1] 47

Or as @user20650 commented, a concise option is crossprod

crossprod(scores, dataset)

Or without even calculating 'scores' in a different step

rowSums(crossprod(dataset))
#[1] 56 47 50 53 55 48 75 67 40 55
0
Sandipan Dey On

Matrix multiplication will also work (reproducible with seed 123):

as.numeric(matrix(scores, nrow=1) %*% dataset)
# [1] 53 72 16 51 43 49 51 49 30 66
0
Ronak Shah On

We can just multiply the matrix of 0's and 1's with the corresponding scores and then get the sum column-wise

colSums(dataset * scores)

#[1] 44 58 50 53 42 60 43 46 55 45