Finding the statistical mode of a vector: When having more than single mode — return the last mode

205 views Asked by At

When calculating the statistical mode of a vector, there is often more than one mode:

c(1, 1, 2, 2, 3, 4) # mode is both 1 and 2

In such scenarios, if I want to decide between two (or more) possible values, I use fmode() from {collapse} package, which offers, through the ties argument, 3 possible methods for deciding:

ties

an integer or character string specifying the method to resolve ties between multiple possible > modes i.e. multiple values with the maximum frequency or sum of weights:

Int. String Description
1 first take the first occurring mode.
2 min take the smallest of the possible modes.
3 max take the largest of the possible modes.

Example of fmode()

library(collapse)

my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2) # 4 modes here: 1, 2, 5, -3

fmode(my_vec, ties = "first")
#> [1] 1
fmode(my_vec, ties = "min")
#> [1] -6
fmode(my_vec, ties = "max")
#> [1] 5

My Question

I'm looking for a "last" method — i.e., whenever there's more than one mode, return the "last" mode. But unfortunately, fmode() doesn't have a "last" method.
So if we return to my example, what I wish is that for the vector:

my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)

I want a function that does

custom_mode_func(my_vec, method = "last")
## [1] 2
1

There are 1 answers

1
Sebastian On BEST ANSWER

The only option you have with collapse is sorting the data beforehand e.g.

library(collapse)
my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)
data.frame(v = my_vec, g = gl(2, 5)) %>% 
  roworder(g) %>% 
  tfm(t = data.table::rowid(g)) %>% 
  roworder(g, -t) %>% 
  gby(g) %>% 
  smr(last = fmode(v, ties = "first"))

The reason revdoesn't work is because collapse grouping doesn't split the data, but only determines to which group a row belongs, and then computes statistics on all groups simultaneously using running algorithms in C++ (e.g. the grouped computation is done by fmode itself). So in your code rev is actually executed before the grouping and reverses the entire vector. In this case, probably a native data.table implementation calling fmode.defaultdirectly (to optimize on method dispatch) would be the fastest solution. I can think about adding a "last" mode if I find time for that.