Create 'dummy' edges for missing edges in edge list

68 views Asked by At

I have a collection of nodes, each associated with a given year like this:

nodes <- data.frame(name = c("a1","a2","a3","a4","a5","b1","b2","b3","b4","c1","c2"),
                    yr =   c("yr1","yr2","yr3","yr4","yr5","yr2","yr3","yr4","yr5","yr4","yr5"))
   name  yr
1    a1 yr1
2    a2 yr2
3    a3 yr3
4    a4 yr4
5    a5 yr5
6    b1 yr2
7    b2 yr3
8    b3 yr4
9    b4 yr5
10   c1 yr4
11   c2 yr5

To help visualize this, look to the table below, where each column is a year and each row is a group of nodes.

data.frame(
  yr1 = c("a1",NA,NA),
  yr2 = c("a2","b1",NA),
  yr3 = c("a3","b2",NA),
  yr4 = c("a4","b3","c1"),
  yr5 = c("a5","b4","c2"))
   yr1  yr2  yr3 yr4 yr5
1   a1   a2   a3  a4  a5
2 <NA>   b1   b2  b3  b4
3 <NA> <NA> <NA>  c1  c2

Now imagine this as an edge list where the node in yr1 is connected to the node in yr2 and so-on. The edge list will look like this:

edges <- data.frame(source = c("a1","a2","a3","a4","b1","b2","b3","c1"),
                    target = c("a2","a3","a4","a5","b2","b3","b4","c2"))
  source target
1     a1     a2
2     a2     a3
3     a3     a4
4     a4     a5
5     b1     b2
6     b2     b3
7     b3     b4
8     c1     c2

What I want to do is to fill in the missing edges with dummy edges. It's easier to visualize using the first table. So it would look like this when all the 'NA' values are replace with the dummies:

data.frame(
  yr1 = c("a1","01","01"),
  yr2 = c("a2","b1","02"),
  yr3 = c("a3","b2","03"),
  yr4 = c("a4","b3","c1"),
  yr5 = c("a5","b4","c2"))
  yr1 yr2 yr3 yr4 yr5
1  a1  a2  a3  a4  a5
2  01  b1  b2  b3  b4
3  01  02  03  c1  c2

In the edge list version of the data, it would look like this:

edges_new <- data.frame(source = c("a1","a2","a3","a4","01","b1","b2","b3","01","02","03","c1"),
                        target = c("a2","a3","a4","a5","b1","b2","b3","b4","02","03","c1","c2"))
   source target
1      a1     a2
2      a2     a3
3      a3     a4
4      a4     a5
5      01     b1
6      b1     b2
7      b2     b3
8      b3     b4
9      01     02
10     02     03
11     03     c1
12     c1     c2

So to summarize, given the 'nodes' data frame and the 'edges' data frame, how can I calculate 'edges_new'? I can figure it out in the wide version of the table where each year has its own column. In that case I could just use a series of if_else functions. But I'm not sure how to do it from an edge list.

Here are the three data frames together:

nodes <- data.frame(name = c("a1","a2","a3","a4","a5","b1","b2","b3","b4","c1","c2"),
                    yr =   c("yr1","yr2","yr3","yr4","yr5","yr2","yr3","yr4","yr5","yr4","yr5"))
edges <- data.frame(source = c("a1","a2","a3","a4","b1","b2","b3","c1"),
                    target = c("a2","a3","a4","a5","b2","b3","b4","c2"))
edges_new <- data.frame(source = c("a1","a2","a3","a4","01","b1","b2","b3","01","02","03","c1"),
                        target = c("a2","a3","a4","a5","b1","b2","b3","b4","02","03","c1","c2"))
1

There are 1 answers

2
Darren Tsai On

The workflow might be

library(dplyr)
library(tidyr)

nodes %>%
  mutate(grp = substr(name, 1, 1)) %>%
  complete(grp, yr) %>%
  mutate(name = coalesce(name, sprintf('%02d', match(yr, unique(yr))))) %>%
  group_by(grp) %>%
  transmute(source = name, target = lead(name)) %>%
  ungroup() %>%
  drop_na()
# A tibble: 12 × 3
   grp   source target
   <chr> <chr>  <chr> 
 1 a     a1     a2    
 2 a     a2     a3    
 3 a     a3     a4    
 4 a     a4     a5    
 5 b     01     b1    
 6 b     b1     b2    
 7 b     b2     b3    
 8 b     b3     b4    
 9 c     01     02    
10 c     02     03    
11 c     03     c1    
12 c     c1     c2