Is there a way in R to use the rank function (or something similar) with multiple criteria and a ties.method?
Normally rank is used to rank values in a vector and if there are ties you can use one of the ties methods ("average", "random", "first", ...). But when ranking a column in a matrix, I would like to use multiple columns and one of the ties methods.
A minimal example:
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 4, 5, 5, 2, 8 ,8, 1,3, 3)
z <- c(0.2, 0.8, 0.5, 0.4, 0.2, 0.1, 0.1, 0.7, 0.3, 0.3)
m <- cbind(x=x,y=y, z=z)
Imagine I want to rank the y
-values in the above matrix. But if there are ties, I want the function to look at the z
-values. If there still are ties after that, then I want to use the ties.method = "random"
-parameter.
In other words, a possible outcome could be:
x y z
[1,] 1 1 0.2
[2,] 8 1 0.7
[3,] 5 2 0.2
[4,] 9 3 0.3
[5,] 10 3 0.3
[6,] 2 4 0.8
[7,] 4 5 0.4
[8,] 3 5 0.5
[9,] 6 8 0.1
[10,] 7 8 0.1
But it could also be this:
x y z
[1,] 1 1 0.2
[2,] 8 1 0.7
[3,] 5 2 0.2
[4,] 10 3 0.3
[5,] 9 3 0.3
[6,] 2 4 0.8
[7,] 4 5 0.4
[8,] 3 5 0.5
[9,] 7 8 0.1
[10,] 6 8 0.1
Notice how the fourth and the fifth row are different (just as the ninth and the tenth). The above outcome I've been able to get with the order
-function (i.e. m[order(m[,2], m[,3], sample(length(x))),]
, but I'd like to receive the rank-values, not the indices of a sorted matrix.
If you need elaboration on why I need the rank-values, feel free to ask and I'll edit the question with extra details. For now I think the minimal example will do.
EDIT: Changed dataframe to matrix as @alistaire pointed out.
Since
order(order(x))
gives the same result asrank(x)
(see Why does order(order(x)) equal rank(x) in R?), you could just doto get the rank values.
Here's a more involved approach that allows you to use methods from
ties.method
. It requiresdplyr
: