R: Rank-function with two variables and ties.method random

2.9k views Asked by At

Is there a way in R to use the rank function (or something similar) with multiple criteria and a ties.method?

Normally rank is used to rank values in a vector and if there are ties you can use one of the ties methods ("average", "random", "first", ...). But when ranking a column in a matrix, I would like to use multiple columns and one of the ties methods.

A minimal example:

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 4, 5, 5, 2, 8 ,8, 1,3, 3)
z <- c(0.2, 0.8, 0.5, 0.4, 0.2, 0.1, 0.1, 0.7, 0.3, 0.3)
m <- cbind(x=x,y=y, z=z)

Imagine I want to rank the y-values in the above matrix. But if there are ties, I want the function to look at the z-values. If there still are ties after that, then I want to use the ties.method = "random"-parameter.

In other words, a possible outcome could be:

       x y   z
 [1,]  1 1 0.2
 [2,]  8 1 0.7
 [3,]  5 2 0.2
 [4,]  9 3 0.3
 [5,] 10 3 0.3
 [6,]  2 4 0.8
 [7,]  4 5 0.4
 [8,]  3 5 0.5
 [9,]  6 8 0.1
[10,]  7 8 0.1

But it could also be this:

       x y   z
 [1,]  1 1 0.2
 [2,]  8 1 0.7
 [3,]  5 2 0.2
 [4,] 10 3 0.3
 [5,]  9 3 0.3
 [6,]  2 4 0.8
 [7,]  4 5 0.4
 [8,]  3 5 0.5
 [9,]  7 8 0.1
[10,]  6 8 0.1

Notice how the fourth and the fifth row are different (just as the ninth and the tenth). The above outcome I've been able to get with the order-function (i.e. m[order(m[,2], m[,3], sample(length(x))),], but I'd like to receive the rank-values, not the indices of a sorted matrix.

If you need elaboration on why I need the rank-values, feel free to ask and I'll edit the question with extra details. For now I think the minimal example will do.

EDIT: Changed dataframe to matrix as @alistaire pointed out.

3

There are 3 answers

2
Weihuang Wong On BEST ANSWER

Since order(order(x)) gives the same result as rank(x) (see Why does order(order(x)) equal rank(x) in R?), you could just do

order(order(y, z, runif(length(y))))

to get the rank values.


Here's a more involved approach that allows you to use methods from ties.method. It requires dplyr:

library(dplyr)
rank2 <- function(df, key1, key2, ties.method) {
  average <- function(x) mean(x)
  random <- function(x) sample(x, length(x))
  df$r <- order(order(df[[key1]], df[[key2]]))
  group_by_(df, key1, key2) %>% mutate(rr = get(ties.method)(r))  
}

rank2(df, "y", "z", "average")
# Source: local data frame [10 x 5]
# Groups: y, z [8]
#        x     y     z     r    rr
#    <dbl> <dbl> <dbl> <int> <dbl>
# 1      1     1   0.2     1   1.0
# 2      2     4   0.8     6   6.0
# 3      3     5   0.5     8   8.0
# 4      4     5   0.4     7   7.0
# 5      5     2   0.2     3   3.0
# 6      6     8   0.1     9   9.5
# 7      7     8   0.1    10   9.5
# 8      8     1   0.7     2   2.0
# 9      9     3   0.3     4   4.5
# 10    10     3   0.3     5   4.5
0
G5W On

Sorry, I misunderstood your question originally. I think that this is what you want. I made one minor change. Specifically, I made your variable df a data frame, not just a matrix.

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 4, 5, 5, 2, 8 ,8, 1,3, 3)
z <- c(0.2, 0.8, 0.5, 0.4, 0.2, 0.1, 0.1, 0.7, 0.3, 0.3)
df <- data.frame(x=x,y=y, z=z)

TM = "last"     ## Your desired ties method here.
df[rank(df$z, ties.method=TM),] = df
df = df[order(df$y),]
df
    x y   z
4   1 1 0.2
9   8 1 0.7
3   5 2 0.2
5  10 3 0.3
6   9 3 0.3
10  2 4 0.8
7   4 5 0.4
8   3 5 0.5
1   7 8 0.1
2   6 8 0.1

You could use any of the ties methods available in rank, but I chose to use "last" here so that it emphasized that it made the order switch.

0
Will T-E On

What about using data.table's frankv function?

library(data.table)
rank <- frankv(list(m[,"y"], m[,"z"]), ties.method = "random")
m <- m[order(rank),]