Break apart uneven list into dataframes to use in ggplot2

120 views Asked by At

I have a huge list of various tibbles that I outputted from running a for loop. The tibbles in this list are the result of using count() and group_by_(), and show the number of people by category a who do and do not work at various facilities, but each element of the list has a variable number of rows and one column where the name differs.

These are three elements of my (20+ element) list:

library(plyr)    
library(tidyverse)

(Hospital1 <- tibble(a = c("In.City", "In.City", "Likely Move", "Likely Move", "Possibly Move", "Unlikely Move", "Unlikely Move", NA), 
                     Something.Hospital1 = c("Hospital1", NA, "Hospital1", NA, NA, "Hospital1", NA, NA), 
                     n= c(sample(1:100, 8, replace = TRUE))))
(Hospital2 <- tibble(a = c("In.City", "In.City", "Likely Move", "Likely Move", "Possibly Move", "Unlikely Move", NA),
                     Something.Hospital2 = c("Hospital2", NA, "Hospital2", NA, NA, NA, NA), 
                     n= c(sample(1:100, 7, replace = TRUE))))
(Hospital3 <- tibble(a = c("In.City", "Likely Move", "Unlikely Move", "Possibly Move", NA),
                     Something.Hospital3 = as.character(c(NA, NA, NA, NA, NA)), 
                     n= c(sample(1:100, 5, replace = TRUE))))

#What my data actually looks like
(example.list <- list(Hospital1, Hospital2, Hospital3))

The values of n in the 3rd column that are not next to the NA in the second column of each list element matter most. I am so far really struggling to get these into a form that ggplot2 can use them. Because there are about 20 list elements I'm working with, I feel I need to use list structure in the code and be able to iterate or make a function if need be. I've tried to:

1) Use cbind(example.list[[1:3]]) and bind_cols([[1:3]]), to bind them into one tibble, though both fail because of the varying number of rows (same with single brackets).

2) Graph those elements individually, though ggplot2 fails with lists.

3) Split them into individual dataframes, using list2env(example.list[1:3], envir = .GlobalEnv), but that gave me the error

names(x) must be a character vector of the same length as x

4) Doing a for loop left_join() but to do so, I would need a key, which I could form using something similar to cumsum() but for character cells. That would have to look like this

              a Something.Hospital1     n match.key
          <chr>               <chr> <int> <dbl>
1       In.City           Hospital1    40     1
2       In.City                <NA>    25     2
3   Likely Move           Hospital1    17     1
4   Likely Move                <NA>    56     2
5 Possibly Move           Hospital1    59     1

5) I tried using ddply(output[[10]], .(crosstab, n), nrow), but V1 isn't helpful for merging like this.

       crosstab   n V1
1     In.Camden  14  1
2     In.Camden 139  1
3   Likely Move   4  1
4   Likely Move 171  1

In my mind, I would like my data to look like this, with all of the possibilities listed. I could then use tidyr() to get it into ggplot form

             a Something.Hospital1    n1 Something.Hospital2    n2 Something.Hospital3    n3
          <chr>               <chr> <int>               <chr> <int>               <chr> <int>
1       In.City           Hospital1    39           Hospital2    24           Hospital3    13
2       In.City                <NA>    86                <NA>    82                <NA>    85
3   Likely Move           Hospital1    77           Hospital2    16           Hospital3    17
4   Likely Move                <NA>     4                <NA>    78                <NA>    49
5 Possibly Move           Hospital1   100           Hospital2    79           Hospital3    95
6 Possibly Move                <NA>    49                <NA>    31                <NA>    62
7 Unlikely Move           Hospital1    82           Hospital2     3           Hospital3    96
8 Unlikely Move                <NA>    27                <NA>    30                <NA>    39
9          <NA>                <NA>    55                <NA>    96                <NA>    47

I don't understand lapply and related at all, so please be clear about how to use those if you suggest them. I prefer dplyr because that's what I know best.

Thanks so much for your help

1

There are 1 answers

2
aosmith On BEST ANSWER

You could rename the column in each tibble to the same thing, which would open the door for binding rows and getting the dataset in the long format without needing to reshape later.

You can loop through the list using lapply or purrr::map, renaming the column in each list element. I'll use map_dfr (map_df in older purrr versions) because it will use bind_rows to bind the datasets together after looping through the list. map_dfr has an .id argument so we can add a column to tell the datasets apart. This may be useful here, but can be skipped.

You can change the column name via rename_at from dplyr. Because you want to rename a single column, I'll name them all to a new name using paste0.

Here's the code to loop through each element of example.list and rename all columns that contain the word "Hospital" with "Something.Hospital".

map_dfr(example.list, ~rename_at(.x, vars( contains( "Hospital") ), 
                                 funs( paste0("Something.Hospital") ) ), 
        .id = "group" )

# A tibble: 20 x 4
   group             a Something.Hospital     n
   <chr>         <chr>              <chr> <int>
 1     1       In.City          Hospital1    31
 2     1       In.City               <NA>    81
 3     1   Likely Move          Hospital1    71
 4     1   Likely Move               <NA>    87
 5     1 Possibly Move               <NA>     6
 6     1 Unlikely Move          Hospital1    16
 7     1 Unlikely Move               <NA>    16
 8     1          <NA>               <NA>    92
 9     2       In.City          Hospital2    98
...