List of data frames with the same number of variables and delete duplicates inside one variable and do the same in the rest of the data frames

171 views Asked by At

I have the following list of data frames and each data frame has 3 variables (a, b and c)

my.list <- list(d1, d2, d3, d4)

Inside my data frame, I have duplicated strings in "a" and I want to delete the rows with duplicated values

The current code i am using:

my.listnew <- lapply(my.list, function(x) unique(x["a"]))

The problem i have with this code is that the other 2 columns "b" and "c" are gone and I want to keep them, while the duplicated rows are deleted

3

There are 3 answers

0
Ronak Shah On BEST ANSWER

Use duplicated to remove the duplicated values in column a while keeping other columns.

my.listnew <- lapply(my.list, function(x) x[!duplicated(x$a), ])
4
AnilGoyal On

Just for reference, tidyverse style of doing it-

set.seed(1)
my.list <- list(d1 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)), 
                d2 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)), 
                d3 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)))
library(tidyverse)
map(my.list, ~ .x %>% filter(!duplicated(a)) )
#> $d1
#>   a         b          c
#> 1 a 1.5952808  0.5757814
#> 2 c 0.3295078 -0.3053884
#> 3 b 0.4874291  0.3898432
#> 
#> $d2
#>   a          b         c
#> 1 b  0.2522234 0.3773956
#> 2 a -0.8919211 0.1333364
#> 
#> $d3
#>   a          b          c
#> 1 a -0.2357066  1.1519118
#> 2 c -0.4333103 -0.4295131
#> 3 b -0.6494716  1.2383041

Created on 2021-05-13 by the reprex package (v2.0.0)

If you also want to combine the dataframes in output you may use map_dfr instead of map in above

0
akrun On

We can use subset without any anonymous function

out <- lapply(my.list, subset, subset = !duplicated(a))

Or using data.table with unique

library(data.table)
out <- lapply(my.list, function(dat) unique(as.data.table(dat), by = 'a'))