I have a data where the first column is a bunch of ID numbers (some repeat), and the second column is just a bunch of numbers. I need a way to keep each ID number only once based on the smallest number in the second column.
Row# ID Number
1 10 180
2 12 167
3 12 182
4 12 135
5 15 152
6 15 133
Ex: I only want to keep Row# 1, 4, and 6 here and delete the rest
For selecting the row that has the minimum 'Number' for each 'ID' group, we can use one of the aggregating by group function. A
base R
option isaggregate
. Withaggregate
, we can either use the 'formula' method or specify alist
of grouping elements/variables with theby
argument. Using theformula
method, we get themin
value of 'Number' for each 'ID'.Or we can use a faster option with
data.table
. Here, we convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID', we get themin
value of "Number".Or this can be also done with
setorder
toorder
the 'Number' column and useunique
withby
option to select the first non-duplicated 'ID' row. (from @David Arenburgs' comments)Or using
dplyr
, we group by 'ID' and get the subset rows withsummarise
.Or we can use
sqldf
syntax to get the subset of data.Update
If there are multiple columns and you want to get the row based on the minimum value of 'Number' for each 'ID', you can use
which.min
. Using.I
will get the row index and that can be used for subsetting the rows.Or with
dplyr
we useslice
to filter out the rows that have themin
value of 'Number' for each 'ID'