R - Add row index to a data frame but handle ties with minimum rank

174 views Asked by At

I successfully used the answer in this SO thread r-how-to-add-row-index-to-a-data-frame-based-on-combination-of-factors but I need to handle situation where two (or more) rows can be tied.

df <- data.frame(
season = c(2014,2014,2014,2014,2014,2014, 2014, 2014), 
week = c(1,1,1,1,2,2,2,2), 
player.name = c("Matt Ryan","Peyton Manning","Cam Newton","Matthew Stafford","Carson Palmer","Andrew Luck", "Aaron Rodgers", "Chad Henne"), 
fant.pts.passing = c(28,19,29,28,18,22,29,22)
)

df <- df[order(-df$season, df$week, -df$fant.pts.passing),]

df$Index <- ave( 1:nrow(df), df$season, df$week, FUN=function(x) 1:length(x) )

df

In this example, for week 1, Matt Ryan and Matthew Stafford would both be 2, and then Peyton Manning would be 4.

3

There are 3 answers

0
josliber On BEST ANSWER

You would want to use the rank function with ties.method="min" within your ave call:

df$Index <- ave(-df$fant.pts.passing, df$season, df$week,
                FUN=function(x) rank(x, ties.method="min"))
df
#   season week      player.name fant.pts.passing Index
# 3   2014    1       Cam Newton               29     1
# 1   2014    1        Matt Ryan               28     2
# 4   2014    1 Matthew Stafford               28     2
# 2   2014    1   Peyton Manning               19     4
# 7   2014    2    Aaron Rodgers               29     1
# 6   2014    2      Andrew Luck               22     2
# 8   2014    2       Chad Henne               22     2
# 5   2014    2    Carson Palmer               18     4
1
akrun On

You could use the faster frank from data.table and assign (:=) the column by reference

library(data.table)#v1.9.5+
setDT(df)[, indx := frank(-fant.pts.passing, ties.method='min'), .(season, week)]
 #   season week      player.name fant.pts.passing indx
 #1:   2014    1       Cam Newton               29    1
 #2:   2014    1        Matt Ryan               28    2
 #3:   2014    1 Matthew Stafford               28    2
 #4:   2014    1   Peyton Manning               19    4
 #5:   2014    2    Aaron Rodgers               29    1
 #6:   2014    2      Andrew Luck               22    2
 #7:   2014    2       Chad Henne               22    2
 #8:   2014    2    Carson Palmer               18    4
1
Molx On

Assuming you want ranks by season and week, this can be easily accomplished with dplyr's min_rank:

library(dplyr)

df %>% group_by(season, week) %>%
  mutate(indx = min_rank(desc(fant.pts.passing)))

#   season week      player.name fant.pts.passing Index indx
# 1   2014    1       Cam Newton               29     1    1
# 2   2014    1        Matt Ryan               28     2    2
# 3   2014    1 Matthew Stafford               28     3    2
# 4   2014    1   Peyton Manning               19     4    4
# 5   2014    2    Aaron Rodgers               29     1    1
# 6   2014    2      Andrew Luck               22     2    2
# 7   2014    2       Chad Henne               22     3    2
# 8   2014    2    Carson Palmer               18     4    4