Removing duplicated values with missing values in a dataframe

Question

Removing duplicated values with missing values in a dataframe

1k views Asked by AMR At 18 December 2016 at 17:27

I have a Dataframe which contains duplicated rows with missing values. I want to remove duplicated rows while retaining the data of a certain column (e.g. Age in below example). Since one column's value has more weight in model than others I would like to retain that column's data. I tried the methods proposed at Removing duplicate Values in Dataframe in R but my dataframe is large and missing values are spread in more than one column. Any suggestion will be appreciated.

**Name, age, city, edu, phone**
ali, 23, bali, matric, NA
brad, 24, sofia, inter, NA
ali, NA, bali, matric, 786
brad, NA, sofia, inter, 555
ali, 9999999, bali, matric, 444

The expected output should look like this:

**Name, age, city, edu, phone**
ali, 23, bali, matric, NA
brad, 24, sofia, inter, NA

Regards,

DF with duplicated Missing values

Original Q&A

There are 1 answers

**mabdrabo** · Accepted Answer · 2016-12-18T18:42:19+00:00

using dplyr, magrittr. You'll need however to set a threshold for the age parameter which might not guarantee a unique set of rows age aside.

THRESHOLD <- 100
df %<>% na.omit() %>% filter(age<THRESHOLD)

or using base as follows

THRESHOLD <- 100
df <- df[complete.cases(df),]
df <- df[df$age < THRESHOLD,]

TechQA.

Removing duplicated values with missing values in a dataframe

There are 1 answers

Related Questions in R

Related Questions in DATAFRAME

Related Questions in DATA-SCIENCE

Popular Questions

Popular Tags

Trending Questions