How do I setup data for tidygraph and ggraph?

121 views Asked by At

I'm wanting to run a network analysis but am completely lost at how to get my data structured correctly, since most examples already have data structured at the to and from level.

An example of my data looks like:

df <- data.frame(Name = c("Alice", "Ben", "Tom", "Jane", "Neil", "Alice", "Tom", "Ben", "Jane", "Neil", "Alice", "Tom", "Ben", "Jane", "Bob"),
         Location = c("Ward", "Desk", "Op", "Call", "Off",
                      "Ward", "Desk", "Op", "Call", "Off",
                      "Ward", "Desk", "Op", "Call", "Off"),
         Rating = c(1, 1, 1, 1, 1, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8))

I now wish to get to and from combinations of people, as denoted by Name, for every Rating. You will also note that people can be at a different Location during a different rating, although I'd prefer to for this, in combination with Name to be the nodes and Rating to be the edges.

I have looked at library(iterpc) but am struggling to comprehend the whole combination thing, with five different lineups.

Is there a potential dplyr solution to my problem? Thank you!

EDIT: It looks as though my question is very similar to this yet the answer marked does not work for me, instead I get Error: Column name Name must not be duplicated.

1

There are 1 answers

0
Eric Leung On BEST ANSWER

If you want the from column to be Name and the to column to be your Rating column, then tidygraph does this mapping for you.

library(tidygraph)
#> Warning: package 'tidygraph' was built under R version 3.6.3
#> 
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#> 
#>     filter

df <- data.frame(
  Name = c(
    "Alice", "Ben", "Tom", "Jane", "Neil",
    "Alice", "Tom", "Ben", "Jane", "Neil",
    "Alice", "Tom", "Ben", "Jane", "Bob"
  ),
  Location = c(
    "Ward", "Desk", "Op", "Call", "Off",
    "Ward", "Desk", "Op", "Call", "Off",
    "Ward", "Desk", "Op", "Call", "Off"
  ),
  Rating = c(
    1, 1, 1, 1, 1,
    10, 10, 10, 10, 10,
    8, 8, 8, 8, 8)
)

tg <- as_tbl_graph(df)
tg
#> # A tbl_graph: 11 nodes and 15 edges
#> #
#> # A directed acyclic multigraph with 4 components
#> #
#> # Node Data: 11 x 1 (active)
#>   name 
#>   <chr>
#> 1 Alice
#> 2 Ben  
#> 3 Tom  
#> 4 Jane 
#> 5 Neil 
#> 6 Bob  
#> # ... with 5 more rows
#> #
#> # Edge Data: 15 x 3
#>    from    to Rating
#>   <int> <int>  <dbl>
#> 1     1     7      1
#> 2     2     8      1
#> 3     3     9      1
#> # ... with 12 more rows

You can double-check this mapping is done correctly by looking at the first row of your edge table and see an edge between 1 and 7, which are Alice and Ward, which is the first row in your original data frame.

data.frame(tg)
#>     name
#> 1  Alice
#> 2    Ben
#> 3    Tom
#> 4   Jane
#> 5   Neil
#> 6    Bob
#> 7   Ward
#> 8   Desk
#> 9     Op
#> 10  Call
#> 11   Off

Created on 2020-09-21 by the reprex package (v0.3.0)