List of data frames with the same number of variables and delete duplicates inside one variable and do the same in the rest of the data frames

Question

List of data frames with the same number of variables and delete duplicates inside one variable and do the same in the rest of the data frames

171 views Asked by Karlos Garcia At 13 May 2021 at 10:09

I have the following list of data frames and each data frame has 3 variables (a, b and c)

my.list <- list(d1, d2, d3, d4)

Inside my data frame, I have duplicated strings in "a" and I want to delete the rows with duplicated values

The current code i am using:

my.listnew <- lapply(my.list, function(x) unique(x["a"]))

The problem i have with this code is that the other 2 columns "b" and "c" are gone and I want to keep them, while the duplicated rows are deleted

Original Q&A

There are 3 answers

AnilGoyal On 13 May 2021 at 10:32

Just for reference, tidyverse style of doing it-

set.seed(1)
my.list <- list(d1 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)), 
                d2 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)), 
                d3 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)))
library(tidyverse)
map(my.list, ~ .x %>% filter(!duplicated(a)) )
#> $d1
#>   a         b          c
#> 1 a 1.5952808  0.5757814
#> 2 c 0.3295078 -0.3053884
#> 3 b 0.4874291  0.3898432
#> 
#> $d2
#>   a          b         c
#> 1 b  0.2522234 0.3773956
#> 2 a -0.8919211 0.1333364
#> 
#> $d3
#>   a          b          c
#> 1 a -0.2357066  1.1519118
#> 2 c -0.4333103 -0.4295131
#> 3 b -0.6494716  1.2383041

^{Created on 2021-05-13 by the reprex package (v2.0.0)}

If you also want to combine the dataframes in output you may use map_dfr instead of map in above

akrun On 13 May 2021 at 17:09

We can use subset without any anonymous function

out <- lapply(my.list, subset, subset = !duplicated(a))

Or using data.table with unique

library(data.table)
out <- lapply(my.list, function(dat) unique(as.data.table(dat), by = 'a'))

**Ronak Shah** · Accepted Answer · 2021-05-13T10:13:36+00:00

Ronak Shah On 13 May 2021 at 10:13 BEST ANSWER

Use duplicated to remove the duplicated values in column a while keeping other columns.

my.listnew <- lapply(my.list, function(x) x[!duplicated(x$a), ])

TechQA.

List of data frames with the same number of variables and delete duplicates inside one variable and do the same in the rest of the data frames

There are 3 answers

Related Questions in R

Related Questions in LIST

Related Questions in FUNCTION

Related Questions in LAPPLY

Related Questions in NESTED-DATALIST

Popular Questions

Trending Questions