I'm looking for a way to to calculate the most frequent name per group, and then change all names in this group to the most frequent name, or create a new column with the most frequent name. In the dataset, there may be ties in the frequent names, so in these cases, I would like it to pick either one of the tied names. For example:
The sample data looks like this:
first_name = c("John", "John", "John Smith", "Linda Dawn", "Linda Dawn", "Linda", "Linda", "Linda Dawn", "Jack", "Jack", "Jack B", "Jack B")
id = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
dt = data.table(cbind(first_name, id))
first_name id
1: John 1
2: John 1
3: John Smith 1
4: Linda Dawn 2
5: Linda Dawn 2
6: Linda 2
7: Linda 2
8: Linda Dawn 2
9: Jack 3
10: Jack 3
11: Jack B 3
12: Jack B 3
The names in my dataset may also contain middle names.
I've tried using the DescTools::Mode()
function, which works well if there are no ties in most frequent names. Using this method, I get the following output:
dt[, first_name_new := Mode(first_name), by = id]
first_name id first_name_new
1: John 1 John
2: John 1 John
3: John Smith 1 John
4: Linda Dawn 2 Linda Dawn
5: Linda Dawn 2 Linda Dawn
6: Linda 2 Linda Dawn
7: Linda 2 Linda Dawn
8: Linda Dawn 2 Linda Dawn
9: Jack 3 <NA>
10: Jack 3 <NA>
11: Jack B 3 <NA>
12: Jack B 3 <NA>
Does anyone know how to get the <NA>
to state Jack or Jack B, or another technique to perform this task?
Thanks in advance!
Mode()
returns 2 values when there is a tie. Simply take the first one from theMode()
: