count values in one dataframe in another dataframe

Question

count values in one dataframe in another dataframe

91 views Asked by Caterina At 22 January 2024 at 20:37

Assume I have a dataframe like this:

df <- data.frame(
  D = c('A', 'B', 'C'),
  Q = c('asd', 'reg', 'rt'),
  id = I(list(c(124, 532, 78), c(1, 3, 532), c(2, 3, 78, 124, 1)))
)

and another one like this:

clusters <- data.frame(
 g = I(list(c(124, 78, 1),c(2, 3, 89),c(532, 533)))
)

I want to count the times the elements in each id list are in the groups g1, g2, g3 (each row in clusters) when it's more than once. So basically count the intersection whenever the intersection > 1.

The output should be:

out <- data.frame(
  D = c('A', 'B', 'C'),
  Q = c('asd', 'reg', 'rt'),
  id = I(list(c(124, 532, 78), c(1, 3, 532), c(2,3,78, 124, 1))),
  count = c(2, 0, 5)
)

I cannot do it with for loop cause it will take forever as I have 70k rows.

A for loop would look like this:

for row in df:
  for group in clusters:
     if group intersect row$id > 1
        count=count+intersection
  count=0

Original Q&A

There are 3 answers

r2evans On 22 January 2024 at 21:04

There's no way around some bespoke code.

sapply(df$id, function(id1) {
  lens <- sapply(clusters$g, function(z) length(intersect(z, id1)))
  sum(replace(lens, lens == 1, 0))
})
# [1] 2 0 5

df <- structure(list(D = c("A", "B", "C"), Q = c("asd", "reg", "rt"), id = structure(list(c(124, 532, 78), c(1, 3, 532), c(2, 3, 78, 124, 1)), class = "AsIs")), class = "data.frame", row.names = c(NA, -3L))

M-- On 22 January 2024 at 22:07

Here's another tidyverse solution, but you're probably better off using base solutions provided for performance.

library(tidyverse)

df %>% 
  unnest(id) %>% 
  left_join({clusters %>% 
      rownames_to_column("r") %>% 
      unnest(g)}, by = c("id" = "g")) %>% 
  add_count(D, Q, r) %>% 
  summarise(id = list(id), n = first(n * (n > 1)), .by = c(D, Q, r)) %>% 
  summarise(id = list(unlist(id)), count = sum(n) , .by = c(D, Q)) %>% 
  as.data.frame()

#>   D   Q               id count
#> 1 A asd     124, 78, 532     2
#> 2 B reg        1, 3, 532     0
#> 3 C  rt 2, 3, 78, 124, 1     5

^{Created on 2024-01-22 with reprex v2.0.2}

**ThomasIsCoding** · Accepted Answer · 2024-01-22T20:53:16+00:00

Probably you can try base R

transform(
    df,
    count = rowSums(outer(id, clusters$g, \(...) mapply(\(...) {
        l <- length(intersect(...))
        l * (l > 1)
    }, ...)))
)

or dplyr

df %>%
    left_join(df %>%
        unnest(id) %>%
        left_join(clusters %>%
            mutate(grp = row_number()) %>%
            unnest(g), by = join_by(id == g)) %>%
        summarise(count = {
            d <- table(grp)
            sum(d[d > 1])
        }, .by = c(D, Q)))

which gives

  D   Q           id count
1 A asd 124, 532, 78     2
2 B reg    1, 3, 532     0
3 C  rt 2, 3, 78....     5

TechQA.

count values in one dataframe in another dataframe

There are 3 answers

Related Questions in R

Related Questions in DATAFRAME

Related Questions in GROUP-BY

Related Questions in INTERSECT

Popular Questions

Trending Questions