Removing duplicated values with missing values in a dataframe

1.1k views Asked by At

I have a Dataframe which contains duplicated rows with missing values. I want to remove duplicated rows while retaining the data of a certain column (e.g. Age in below example). Since one column's value has more weight in model than others I would like to retain that column's data. I tried the methods proposed at Removing duplicate Values in Dataframe in R but my dataframe is large and missing values are spread in more than one column. Any suggestion will be appreciated.

**Name, age, city, edu, phone**
ali, 23, bali, matric, NA
brad, 24, sofia, inter, NA
ali, NA, bali, matric, 786
brad, NA, sofia, inter, 555
ali, 9999999, bali, matric, 444

The expected output should look like this:

**Name, age, city, edu, phone**
ali, 23, bali, matric, NA
brad, 24, sofia, inter, NA

Regards,

DF with duplicated Missing values

1

There are 1 answers

0
mabdrabo On BEST ANSWER

using dplyr, magrittr. You'll need however to set a threshold for the age parameter which might not guarantee a unique set of rows age aside.

THRESHOLD <- 100
df %<>% na.omit() %>% filter(age<THRESHOLD)

or using base as follows

THRESHOLD <- 100
df <- df[complete.cases(df),]
df <- df[df$age < THRESHOLD,]