My TeamName column does not reflect unique team names. Therefore, I have to find a way to identify unique teams through the unique RaterID & RateeID columns. My data consists of dyadic information within a team. Therefore, if a number in the RaterID column appears in the RateeID column, both people are in the same team. I am trying to create a unique team ID but the only way to distinguish between teams is when the RaterID also appears in the RateeID column. This is dyadic data collected in a round robin style within a team. I figured I could create a new column that combines the RaterID & RateeID then create a value (maybe using the rank function?) that would help me distinguish between teams. My data contains over 3000 teams so I thought I would first group_by team name then examine the dyads for commonality in order to create a new column that I could later paste with the TeamName to make a unique team ID. This is my first question on here, so hopefully I am articulating this well…

I am new to r and have no idea what to try...

creating dataframe

df<-data.frame(RaterID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 5, 5, 6, 6, 8, 8, 9, 9, 10, 10), RateeID = c(2, 3, 4, 1, 3, 4, 1, 2, 4, 6, 7, 5, 7, 9, 10, 8, 10, 8, 9), TeamName = c('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B'))

group by TeamName to ease calculating unique team ID for a big data

library(dplyr) df %>% group_by (TeamName)

Here is where I am lost… How do I write a function that says if RaterID also occurs in RateeID within a group (i.e. TeamName) then create a unique identifier. Perhaps use the rank function? Then I could use that to combine it with TeamName and finally get a unique team ID.

My desired result is:

RaterID  RateeID TeamName UniqueTeamID   
 1          2       A          A1
 1          3       A          A1
 1          4       A          A1
 2          1       A          A1   
 2          3       A          A1
 2          4       A          A1   
 3          1       A          A1   
 3          2       A          A1   
 3          4       A          A1   
 5          6       A          A2   
 5          7       A          A2
 6          5       A          A2
 6          7       A          A2
 8          9       B          B1   
 8          10      B          B1   
 9          8       B          B1       
 9          10      B          B1   
 10         8       B          B1       
 10         9       B          B1       

1 Answers

1
camille On

This question is actually more complex than it seemed at first: it isn't just a simple ranking, but rather detection of groups within a network of rater-ratees. An approach that is overkill for this small sample but should be appropriate for your full data is to recognize that this is a network with subgraphs. I'm not super skilled in network analysis, but I know enough to figure out what the subgraphs are, and tidygraph makes parts of this easy to fit in a dplyr workflow.

Make a graph of the data and plot to confirm that these are subgraphs:

library(dplyr)
library(purrr)
library(igraph)
library(tidygraph)

rate_graph <- igraph::graph_from_data_frame(df)

plot(rate_graph)

igraph::decompose splits those subgraphs into a list of igraph objects. Using purrr::map and tidygraph::as_tbl_graph, I'm converting each of those list items to tbl_graphs and then data frames, and binding it by rows back into a single data frame. The reason for that is to get an ID for which list item each observation comes from.

groups <- decompose(rate_graph) %>%
  map(as_tbl_graph) %>%
  map_dfr(as_tibble, .id = "group_num") %>%
  mutate(name = as.numeric(name))

Then I join that table of group IDs back to the original data by rater ID and, for each team name, counting along the group ID with dense_rank. That gets the team IDs A1, A2, and B1. Drop the extra columns if you need to.

df %>%
  left_join(groups, by = c("RaterID" = "name")) %>%
  group_by(TeamName) %>%
  mutate(subteam = dense_rank(group_num)) %>%
  mutate(team_id = paste0(TeamName, subteam)) %>%
  ungroup()
#> # A tibble: 19 x 6
#>    RaterID RateeID TeamName group_num subteam team_id
#>      <dbl>   <dbl> <fct>    <chr>       <int> <chr>  
#>  1       1       2 A        1               1 A1     
#>  2       1       3 A        1               1 A1     
#>  3       1       4 A        1               1 A1     
#>  4       2       1 A        1               1 A1     
#>  5       2       3 A        1               1 A1     
#>  6       2       4 A        1               1 A1     
#>  7       3       1 A        1               1 A1     
#>  8       3       2 A        1               1 A1     
#>  9       3       4 A        1               1 A1     
#> 10       5       6 A        2               2 A2     
#> 11       5       7 A        2               2 A2     
#> 12       6       5 A        2               2 A2     
#> 13       6       7 A        2               2 A2     
#> 14       8       9 B        3               1 B1     
#> 15       8      10 B        3               1 B1     
#> 16       9       8 B        3               1 B1     
#> 17       9      10 B        3               1 B1     
#> 18      10       8 B        3               1 B1     
#> 19      10       9 B        3               1 B1