Create bipartite projection based on condition applied to two edges

Question

Create bipartite projection based on condition applied to two edges

62 views Asked by markrt At 10 November 2023 at 16:33

I've got a dataset with distinct source and target nodes, as well as a numeric variable that's relevant to the relationship.

It looks a bit like this:

library(igraph)
library(tidygraph)

set.seed(24601)

example_data <- 
  data.frame(source = 
             sample(letters[1:10],
                    100,
                    replace = TRUE),
           target =
             sample(letters[16:25],
                    100,
                    replace = TRUE),
           important_variable =
             rnorm(100))

Imagine that the members of source are individuals, members of target are different cities that they've travelled to, and I want to create a network that shows when two given cities were visited by the same person. I'd use bipartite_projection() for this, like so:


example_data %>% 
  graph_from_data_frame() %>% 
  as_tbl_graph() %>% 
  mutate(type = 
           ifelse(name %in% letters[1:10],
                   TRUE,
                   FALSE)) %>% 
  bipartite_projection(which = "true")

However, I'd like to connect different cities only when a certain condition is met: for example, when the difference in the values of important_variable is a maximum of 0.5 (say, I'm interested when two cities have been visited by the same person in the same year). At the moment, the information from important_variable is discarded after the use of bipartite_projection.

I can't see a means of restricting the bipartite_projection based on a third numeric variable. Is it possible to do so? Thanks in advance for any help.

Update with edit to show the desired output:

Let's look at a small number of rows:

example_data %>% 
  filter(source == "a") %>% 
  head()

This produces the following:

  source target important_variable
1      a      x         0.29773720
2      a      p         1.50474490
3      a      y         0.01149263
4      a      q         0.19391773
5      a      t        -0.10656946
6      a      w        -0.29516668

I can go straight into a bipartite projection, like so:

example_data %>% 
  filter(source == "a") %>% 
  head()  %>% 
  graph_from_data_frame() %>% 
  as_tbl_graph() %>% 
  mutate(type = 
           ifelse(name %in% letters[1:10],
                  TRUE,
                  FALSE)) %>% 
  bipartite_projection(which = "false")

which produces an igraph object with one vertex attribute - name - and one edge attribute - node.

However, I'd like something that looks like this (just the first four rows for simplicity):

  source_projected target_projected source_att  target_att
1                x                p  0.2977372  1.50474490
2                x                y  0.2977372  0.01149263
3                x                q  0.2977372  0.19391773
4                x                t  0.2977372 -0.10656946

as this would allow me to filter based on the relationship between my source_att and target_att columns (for example, filtering where the difference between source_att and target_att is less than 0.5)

Second update, with more detailed desired output

@ThomasIsCoding has provided a solution that fits with my request. This has made me realise that I wasn't sufficiently detailed.

Starting again with the original data, we can see that a is linked to p twice, and a is linked to y twice. In each case, the value of important_variable is different. See below:

example_data %>% 
  filter(source == "a" &
           (target == "p" |
              target == "y")) 

  source target important_variable
1      a      p         1.50474490
2      a      y         0.01149263
3      a      y        -2.34069094
4      a      p         0.29294049

The example desired data that I posted only includes each node within target being connected once. However, because the values of important_variable differ, I'd like output that includes all configurations of those pairings, to look like so:

  source_projected target_projected source_att  target_att
1                p                y  0.2977372  0.01149263
2                p                y  0.2977372 -2.34069094
3                p                y  0.2929405  0.01149263
4                p                y  0.2929405 -2.34069094

Is this something that's possible to construct? Thanks!

Original Q&A

There are 1 answers

**ThomasIsCoding** · Accepted Answer · 2023-11-13T11:43:45+00:00

Update

Since you may have multiple values for a single target, I guess it would be better to use left_join and enable "many-to-many" for the relationship argument

out <- example_data %>%
    graph_from_data_frame() %>%
    set_vertex_attr(
        name = "type",
        value = names(V(.)) %in% example_data$target
    ) %>%
    bipartite_projection() %>%
    pluck("proj2") %>%
    as_data_frame() %>%
    select(-weight) %>%
    left_join(select(example_data, -source),
        join_by(from == target),
        relationship = "many-to-many"
    ) %>%
    left_join(select(example_data, -source),
        join_by(to == target),
        relationship = "many-to-many"
    ) %>%
    rename(all_of(c(source_att = "important_variable.x", target_att = "important_variable.y")))

and you will see

> head(out)
  from to source_att  target_att
1    x  y  0.2977372  0.50506407
2    x  y  0.2977372 -1.37333412
3    x  y  0.2977372  0.61981223
4    x  y  0.2977372  0.43724194
5    x  y  0.2977372 -1.97363488
6    x  y  0.2977372 -0.02413137

> glimpse(out)
Rows: 4,462
Columns: 4
$ from       <chr> "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",…
$ to         <chr> "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",…
$ source_att <dbl> 0.2977372, 0.2977372, 0.2977372, 0.2977372, 0.2977372, 0.29…
$ target_att <dbl> 0.50506407, -1.37333412, 0.61981223, 0.43724194, -1.9736348…

example_data %>%
    graph_from_data_frame() %>%
    set_vertex_attr(
        name = "type",
        value = names(V(.)) %in% example_data$target
    ) %>%
    bipartite_projection() %>%
    pluck("proj2") %>%
    as_data_frame() %>%
    select(-weight) %>%
    mutate(
        source_att = with(example_data, important_variable[match(from, target)]),
        target_att = with(example_data, important_variable[match(to, target)])
    )

which gives

   from to  source_att  target_att
1     x  y  0.29773720  0.50506407
2     x  p  0.29773720 -0.74022203
3     x  u  0.29773720 -2.04969760
4     x  q  0.29773720  1.36281039
5     x  w  0.29773720 -0.47578690
6     x  s  0.29773720  0.03233063
7     x  t  0.29773720 -1.08378137
8     x  r  0.29773720 -0.72029435
9     x  v  0.29773720 -0.22919308
10    y  p  0.50506407 -0.74022203
11    y  u  0.50506407 -2.04969760
12    y  q  0.50506407  1.36281039
13    y  w  0.50506407 -0.47578690
14    y  s  0.50506407  0.03233063
15    y  t  0.50506407 -1.08378137
16    y  r  0.50506407 -0.72029435
17    y  v  0.50506407 -0.22919308
18    p  u -0.74022203 -2.04969760
19    p  q -0.74022203  1.36281039
20    p  w -0.74022203 -0.47578690
21    p  s -0.74022203  0.03233063
22    p  t -0.74022203 -1.08378137
23    p  r -0.74022203 -0.72029435
24    p  v -0.74022203 -0.22919308
25    r  u -0.72029435 -2.04969760
26    r  q -0.72029435  1.36281039
27    r  w -0.72029435 -0.47578690
28    r  s -0.72029435  0.03233063
29    r  t -0.72029435 -1.08378137
30    r  v -0.72029435 -0.22919308
31    u  q -2.04969760  1.36281039
32    u  w -2.04969760 -0.47578690
33    u  s -2.04969760  0.03233063
34    u  t -2.04969760 -1.08378137
35    u  v -2.04969760 -0.22919308
36    v  s -0.22919308  0.03233063
37    v  t -0.22919308 -1.08378137
38    v  q -0.22919308  1.36281039
39    v  w -0.22919308 -0.47578690
40    q  w  1.36281039 -0.47578690
41    q  s  1.36281039  0.03233063
42    q  t  1.36281039 -1.08378137
43    w  s -0.47578690  0.03233063
44    w  t -0.47578690 -1.08378137
45    s  t  0.03233063 -1.08378137

and then I guess you know how to filter the rows with a constraint on the the difference between source_att and target_att.

TechQA.

Create bipartite projection based on condition applied to two edges

Update with edit to show the desired output:

Second update, with more detailed desired output

There are 1 answers

Update

Previous

Related Questions in R

Related Questions in IGRAPH

Related Questions in BIPARTITE

Related Questions in TIDYGRAPH

Popular Questions

Popular Tags

Trending Questions