I've got a dataset with distinct source and target nodes, as well as a numeric variable that's relevant to the relationship.
It looks a bit like this:
library(igraph)
library(tidygraph)
set.seed(24601)
example_data <-
data.frame(source =
sample(letters[1:10],
100,
replace = TRUE),
target =
sample(letters[16:25],
100,
replace = TRUE),
important_variable =
rnorm(100))
Imagine that the members of source are individuals, members of target are different cities that they've travelled to, and I want to create a network that shows when two given cities were visited by the same person. I'd use bipartite_projection() for this, like so:
example_data %>%
graph_from_data_frame() %>%
as_tbl_graph() %>%
mutate(type =
ifelse(name %in% letters[1:10],
TRUE,
FALSE)) %>%
bipartite_projection(which = "true")
However, I'd like to connect different cities only when a certain condition is met: for example, when the difference in the values of important_variable is a maximum of 0.5 (say, I'm interested when two cities have been visited by the same person in the same year). At the moment, the information from important_variable is discarded after the use of bipartite_projection.
I can't see a means of restricting the bipartite_projection based on a third numeric variable. Is it possible to do so? Thanks in advance for any help.
Update with edit to show the desired output:
Let's look at a small number of rows:
example_data %>%
filter(source == "a") %>%
head()
This produces the following:
source target important_variable
1 a x 0.29773720
2 a p 1.50474490
3 a y 0.01149263
4 a q 0.19391773
5 a t -0.10656946
6 a w -0.29516668
I can go straight into a bipartite projection, like so:
example_data %>%
filter(source == "a") %>%
head() %>%
graph_from_data_frame() %>%
as_tbl_graph() %>%
mutate(type =
ifelse(name %in% letters[1:10],
TRUE,
FALSE)) %>%
bipartite_projection(which = "false")
which produces an igraph object with one vertex attribute - name
- and one edge attribute - node
.
However, I'd like something that looks like this (just the first four rows for simplicity):
source_projected target_projected source_att target_att
1 x p 0.2977372 1.50474490
2 x y 0.2977372 0.01149263
3 x q 0.2977372 0.19391773
4 x t 0.2977372 -0.10656946
as this would allow me to filter based on the relationship between my source_att
and target_att
columns (for example, filtering where the difference between source_att
and target_att
is less than 0.5)
Second update, with more detailed desired output
@ThomasIsCoding has provided a solution that fits with my request. This has made me realise that I wasn't sufficiently detailed.
Starting again with the original data, we can see that a
is linked to p
twice, and a
is linked to y
twice. In each case, the value of important_variable
is different. See below:
example_data %>%
filter(source == "a" &
(target == "p" |
target == "y"))
source target important_variable
1 a p 1.50474490
2 a y 0.01149263
3 a y -2.34069094
4 a p 0.29294049
The example desired data that I posted only includes each node within target
being connected once. However, because the values of important_variable
differ, I'd like output that includes all configurations of those pairings, to look like so:
source_projected target_projected source_att target_att
1 p y 0.2977372 0.01149263
2 p y 0.2977372 -2.34069094
3 p y 0.2929405 0.01149263
4 p y 0.2929405 -2.34069094
Is this something that's possible to construct? Thanks!
Update
Since you may have multiple values for a single target, I guess it would be better to use
left_join
and enable"many-to-many"
for therelationship
argumentand you will see
Previous
Probably you can try the code below
which gives
and then I guess you know how to filter the rows with a constraint on the the difference between
source_att
andtarget_att
.