Generating group id for pairs of data frames in R

Question

Generating group id for pairs of data frames in R

64 views Asked by HSJ At 14 November 2023 at 10:17

I have six data frames, some of which contain the same structure and values but some are not. I have compared all combinations of those data frames using identical() in loop function and summarized comparison result in the following sheet.

result <- data.frame(source=c(1,1,1,1,1,2,2,2,2,3,3,3,4,4,5), dest=c(2,3,4,5,6,3,4,5,6,4,5,6,5,6,6), TF=c("T","F","F","F","F","T","F","F","F","F","F","F","T","F","F"))

> result
source dest TF
1       1    2  T
2       1    3  F
3       1    4  F
4       1    5  F
5       1    6  F
6       2    3  T
7       2    4  F
8       2    5  F
9       2    6  F
10      3    4  F
11      3    5  F
12      3    6  F
13      4    5  T
14      4    6  F
15      5    6  F

sourece and dest are the combination list of five data frames. TF stores whether those two data frames are the same or not.

For instance, data frame 1 and 2 are the same, then 2 and 3 are the same as well. Thus those data frame 1, 2 and 3 are the same and then unique group id will be given. Next, data frame 4 and 5 are the same data frame. Thus those data frames will have another group id. Data frame 6 does not have any identical data frame, thus this will have another group id. This will return the following table.

> group_id
dfID, GroupID
1 1
2 1
3 1
4 2
5 2
6 3

Is there any ideas to conver the result table into group_id? The challenge is how to define connections/chain between data frames with T which shares the same data frame number in source or dest so that we can identify group. Once identified the group, we can simply give sequential numbers from first to the last group. Another challenge is that there may be several unique data frames which are not identical with others. We need to add unique group id for them as well.

Original Q&A

There are 2 answers

zx8754 On 14 November 2023 at 10:45

Using igraph, get the membership for rows where TF is T:

library(igraph)

ix <- result$TF == "T"
g <- graph_from_data_frame(result[ ix, ])
plot(g)

cm <- components(g)$membership
cm
# 1 2 4 3 5 
# 1 1 2 1 2 
result$grp[ ix ] <- cm[ as.character(result$source[ ix ]) ]

result[ ix, ]
#    source dest TF grp
# 1       1    2  T   1
# 6       2    3  T   1
# 13      4    5  T   2

**ThomasIsCoding** · Accepted Answer · 2023-11-14T10:45:52+00:00

You can use subgraph.edges and components along with membership to make it

library(igraph)

result %>%
  graph_from_data_frame() %>%
  subgraph.edges(which(E(.)$TF == "T"), delete.vertices = FALSE) %>%
  components() %>%
  membership() %>%
  stack() %>%
  rev() %>%
  setNames(c("dfID", "GroupID")) %>%
  type.convert(as.is = TRUE)

which gives output like

  dfID GroupID
1    1       1
2    2       1
3    3       1
4    4       2
5    5       2
6    6       3

TechQA.

Generating group id for pairs of data frames in R

There are 2 answers

Related Questions in R

Related Questions in GROUPING

Related Questions in IGRAPH

Related Questions in PAIRING

Popular Questions

Popular Tags

Trending Questions