Returning columns containing maximum values across all the variables in a data frame

4.2k views Asked by At
zone_id=1:6
v1=c(12,22,31,12,5,17)
v2=c(15,22,28,16,18,21)
v3=c(18,10,14,9,10,17)
v4=c(20,3,2,5,12,21)
mydata=data.frame(zone_id,v1,v2,v3,v4)

I have a dataframe a crude model of which can be made using the above code. It consists of rows of data relating to geographical areas. I have variables (4 in this example, but 69 in my actual dataset) which contain integers which are observations in those areas. For each zone_id I want to identify the variable from V1 to V4 which contains the maximum value. Where there is a tie I want to return the names of each of the variables that contain the tied maximum value. So for zone 1, I want to return V4, for zone 2 I want to return V1 and V2 and so on.

I am very new to R and have been unable to get to first base with this. I've explored R help files and thought there may be a solution using sweep? Any help appreciated.

2

There are 2 answers

0
James On

One method is to use rank. Note the minus sign before the data as the default ordering is low->high

x <- apply(-mydata[,-1],1,rank,ties.method="min")
x
   [,1] [,2] [,3] [,4] [,5] [,6]
v1    4    1    1    2    4    3
v2    3    1    2    1    1    1
v3    2    3    3    3    3    3
v4    1    4    4    4    2    1

And you could extract the names with sapply:

sapply(mydata$zone_id,function(y) rownames(x)[x[,y]==1])
[[1]]
[1] "v4"

[[2]]
[1] "v1" "v2"

[[3]]
[1] "v1"

[[4]]
[1] "v2"

[[5]]
[1] "v2"

[[6]]
[1] "v2" "v4"
6
Gavin Simpson On

You could use the which(x == max(x)) idiom here and use apply() to run this against each row:

apply(mydata[, -1], 1, function(x) which(x == max(x)))

which gives:

> apply(mydata[, -1], 1, function(x) which(x == max(x)))
[[1]]
v4 
 4 

[[2]]
v1 v2 
 1  2 

[[3]]
v1 
 1 

[[4]]
v2 
 2 

[[5]]
v2 
 2 

[[6]]
v2 v4 
 2  4

The list contains the vectors of indices of the variable(s) that are maximal, and the names of those vectors can be extracted using names() to give the actual variable ID:

> out <- apply(mydata[, -1], 1,  function(x) which(x == max(x)))
> names(out[[2]])
[1] "v1" "v2"
> lapply(out, names)
[[1]]
[1] "v4"

[[2]]
[1] "v1" "v2"

[[3]]
[1] "v1"

[[4]]
[1] "v2"

[[5]]
[1] "v2"

[[6]]
[1] "v2" "v4"

If your data might contain NA then we need to be a little bit more clever, e.g.

apply(mydata[, -1], 1, 
      function(x, na.rm = FALSE)
          which(x == max(x, na.rm = na.rm)),
      na.rm = TRUE)

wherein we can pass in whether to ignore the NA or not.