Create bipartite projection based on condition applied to two edges

58 views Asked by At

I've got a dataset with distinct source and target nodes, as well as a numeric variable that's relevant to the relationship.

It looks a bit like this:

library(igraph)
library(tidygraph)

set.seed(24601)

example_data <- 
  data.frame(source = 
             sample(letters[1:10],
                    100,
                    replace = TRUE),
           target =
             sample(letters[16:25],
                    100,
                    replace = TRUE),
           important_variable =
             rnorm(100))

Imagine that the members of source are individuals, members of target are different cities that they've travelled to, and I want to create a network that shows when two given cities were visited by the same person. I'd use bipartite_projection() for this, like so:


example_data %>% 
  graph_from_data_frame() %>% 
  as_tbl_graph() %>% 
  mutate(type = 
           ifelse(name %in% letters[1:10],
                   TRUE,
                   FALSE)) %>% 
  bipartite_projection(which = "true")

However, I'd like to connect different cities only when a certain condition is met: for example, when the difference in the values of important_variable is a maximum of 0.5 (say, I'm interested when two cities have been visited by the same person in the same year). At the moment, the information from important_variable is discarded after the use of bipartite_projection.

I can't see a means of restricting the bipartite_projection based on a third numeric variable. Is it possible to do so? Thanks in advance for any help.

Update with edit to show the desired output:

Let's look at a small number of rows:

example_data %>% 
  filter(source == "a") %>% 
  head()

This produces the following:

  source target important_variable
1      a      x         0.29773720
2      a      p         1.50474490
3      a      y         0.01149263
4      a      q         0.19391773
5      a      t        -0.10656946
6      a      w        -0.29516668

I can go straight into a bipartite projection, like so:

example_data %>% 
  filter(source == "a") %>% 
  head()  %>% 
  graph_from_data_frame() %>% 
  as_tbl_graph() %>% 
  mutate(type = 
           ifelse(name %in% letters[1:10],
                  TRUE,
                  FALSE)) %>% 
  bipartite_projection(which = "false")

which produces an igraph object with one vertex attribute - name - and one edge attribute - node.

However, I'd like something that looks like this (just the first four rows for simplicity):

  source_projected target_projected source_att  target_att
1                x                p  0.2977372  1.50474490
2                x                y  0.2977372  0.01149263
3                x                q  0.2977372  0.19391773
4                x                t  0.2977372 -0.10656946

as this would allow me to filter based on the relationship between my source_att and target_att columns (for example, filtering where the difference between source_att and target_att is less than 0.5)

Second update, with more detailed desired output

@ThomasIsCoding has provided a solution that fits with my request. This has made me realise that I wasn't sufficiently detailed.

Starting again with the original data, we can see that a is linked to p twice, and a is linked to y twice. In each case, the value of important_variable is different. See below:

example_data %>% 
  filter(source == "a" &
           (target == "p" |
              target == "y")) 

  source target important_variable
1      a      p         1.50474490
2      a      y         0.01149263
3      a      y        -2.34069094
4      a      p         0.29294049

The example desired data that I posted only includes each node within target being connected once. However, because the values of important_variable differ, I'd like output that includes all configurations of those pairings, to look like so:

  source_projected target_projected source_att  target_att
1                p                y  0.2977372  0.01149263
2                p                y  0.2977372 -2.34069094
3                p                y  0.2929405  0.01149263
4                p                y  0.2929405 -2.34069094

Is this something that's possible to construct? Thanks!

1

There are 1 answers

3
ThomasIsCoding On BEST ANSWER

Update

Since you may have multiple values for a single target, I guess it would be better to use left_join and enable "many-to-many" for the relationship argument

out <- example_data %>%
    graph_from_data_frame() %>%
    set_vertex_attr(
        name = "type",
        value = names(V(.)) %in% example_data$target
    ) %>%
    bipartite_projection() %>%
    pluck("proj2") %>%
    as_data_frame() %>%
    select(-weight) %>%
    left_join(select(example_data, -source),
        join_by(from == target),
        relationship = "many-to-many"
    ) %>%
    left_join(select(example_data, -source),
        join_by(to == target),
        relationship = "many-to-many"
    ) %>%
    rename(all_of(c(source_att = "important_variable.x", target_att = "important_variable.y")))

and you will see

> head(out)
  from to source_att  target_att
1    x  y  0.2977372  0.50506407
2    x  y  0.2977372 -1.37333412
3    x  y  0.2977372  0.61981223
4    x  y  0.2977372  0.43724194
5    x  y  0.2977372 -1.97363488
6    x  y  0.2977372 -0.02413137

> glimpse(out)
Rows: 4,462
Columns: 4
$ from       <chr> "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",…
$ to         <chr> "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",…
$ source_att <dbl> 0.2977372, 0.2977372, 0.2977372, 0.2977372, 0.2977372, 0.29…
$ target_att <dbl> 0.50506407, -1.37333412, 0.61981223, 0.43724194, -1.9736348…

Previous

Probably you can try the code below

example_data %>%
    graph_from_data_frame() %>%
    set_vertex_attr(
        name = "type",
        value = names(V(.)) %in% example_data$target
    ) %>%
    bipartite_projection() %>%
    pluck("proj2") %>%
    as_data_frame() %>%
    select(-weight) %>%
    mutate(
        source_att = with(example_data, important_variable[match(from, target)]),
        target_att = with(example_data, important_variable[match(to, target)])
    )

which gives

   from to  source_att  target_att
1     x  y  0.29773720  0.50506407
2     x  p  0.29773720 -0.74022203
3     x  u  0.29773720 -2.04969760
4     x  q  0.29773720  1.36281039
5     x  w  0.29773720 -0.47578690
6     x  s  0.29773720  0.03233063
7     x  t  0.29773720 -1.08378137
8     x  r  0.29773720 -0.72029435
9     x  v  0.29773720 -0.22919308
10    y  p  0.50506407 -0.74022203
11    y  u  0.50506407 -2.04969760
12    y  q  0.50506407  1.36281039
13    y  w  0.50506407 -0.47578690
14    y  s  0.50506407  0.03233063
15    y  t  0.50506407 -1.08378137
16    y  r  0.50506407 -0.72029435
17    y  v  0.50506407 -0.22919308
18    p  u -0.74022203 -2.04969760
19    p  q -0.74022203  1.36281039
20    p  w -0.74022203 -0.47578690
21    p  s -0.74022203  0.03233063
22    p  t -0.74022203 -1.08378137
23    p  r -0.74022203 -0.72029435
24    p  v -0.74022203 -0.22919308
25    r  u -0.72029435 -2.04969760
26    r  q -0.72029435  1.36281039
27    r  w -0.72029435 -0.47578690
28    r  s -0.72029435  0.03233063
29    r  t -0.72029435 -1.08378137
30    r  v -0.72029435 -0.22919308
31    u  q -2.04969760  1.36281039
32    u  w -2.04969760 -0.47578690
33    u  s -2.04969760  0.03233063
34    u  t -2.04969760 -1.08378137
35    u  v -2.04969760 -0.22919308
36    v  s -0.22919308  0.03233063
37    v  t -0.22919308 -1.08378137
38    v  q -0.22919308  1.36281039
39    v  w -0.22919308 -0.47578690
40    q  w  1.36281039 -0.47578690
41    q  s  1.36281039  0.03233063
42    q  t  1.36281039 -1.08378137
43    w  s -0.47578690  0.03233063
44    w  t -0.47578690 -1.08378137
45    s  t  0.03233063 -1.08378137

and then I guess you know how to filter the rows with a constraint on the the difference between source_att and target_att.