I have a Dataframe which contains duplicated rows with missing values. I want to remove duplicated rows while retaining the data of a certain column (e.g. Age in below example). Since one column's value has more weight in model than others I would like to retain that column's data. I tried the methods proposed at Removing duplicate Values in Dataframe in R but my dataframe is large and missing values are spread in more than one column. Any suggestion will be appreciated.
**Name, age, city, edu, phone**
ali, 23, bali, matric, NA
brad, 24, sofia, inter, NA
ali, NA, bali, matric, 786
brad, NA, sofia, inter, 555
ali, 9999999, bali, matric, 444
The expected output should look like this:
**Name, age, city, edu, phone**
ali, 23, bali, matric, NA
brad, 24, sofia, inter, NA
Regards,
using
dplyr
,magrittr
. You'll need however to set a threshold for the age parameter which might not guarantee a unique set of rowsage
aside.or using
base
as follows