Iterated plotting from list of list of dataframes

257 views Asked by At

I'm trying to explore a large dataset, both with data frames and with charts. I'd like to analyze the distribution of each variable by different metrics (e.g., sum(x), sum(x*y)) and for different sub-populations. I have 4 sub-populations, 2 metrics, and many variables.

In order to accomplish that, I've made a list structure such as this:

$variable1
...$metric1     <--- that's a df.
...$metric2
$variable2
...$metric1
...$metric2

Inside one of the data_frames (e.g., list$variable1$metric1), I've calculated distributions of the unique values for variable1 and for each of the four population groups (represented in columns). It looks like this:

$variable1$metric1
unique_values med_all med_some_not_all med_at_least_some med_none
1 (1) 12-17 Years Old      NA               NA                NA       NA
2 (2) 18-25 Years Old   0.278            0.317             0.278    0.317
3 (3) 26-34 Years Old   0.225            0.228             0.225    0.228
4     (4) 35 or Older   0.497            0.456             0.497    0.456


$variable1$metric2
        unique_values med_all med_some_not_all med_at_least_some med_none
1 (1) 12-17 Years Old      NA               NA                NA       NA
2 (2) 18-25 Years Old   0.544            0.406             0.544    0.406
3 (3) 26-34 Years Old   0.197            0.310             0.197    0.310
4     (4) 35 or Older   0.259            0.284             0.259    0.284

What I'm trying to figure out is a good way to loop through the list of lists (probably melting the DFs in the process) and then output a ton of bar charts. In this case, the natural plot format would be, for each dataframe, a stacked bar chart with one stacked bar for each sub-population, grouping by the variable's unique values.

But I'm not familiar with iterated plotting and so I've hit a dead end. How might I plot from that list structure? Alternately, is there a better structure in which i should be storing this information?

3

There are 3 answers

0
Pierre L On BEST ANSWER

here's a start:

lst <- list(alpha= list(a= data.frame(matrix(1:4, 2)), b= data.frame(matrix(6:11, 2))), 
                          beta = list(c = data.frame(matrix(11:14, 2))))

lst
$alpha
$alpha$a
  X1 X2
1  1  3
2  2  4

$alpha$b
  X1 X2 X3
1  6  8 10
2  7  9 11


$beta
$beta$c
  X1 X2
1 11 13
2 12 14

#We can subset by number or by name
lst[['alpha']]
$a
  X1 X2
1  1  3
2  2  4

$b
  X1 X2 X3
1  6  8 10
2  7  9 11

lst[[1]]
$a
  X1 X2
1  1  3
2  2  4

$b
  X1 X2 X3
1  6  8 10
2  7  9 11

#The dollar sign naming convention reminds us that we are looking at a list.
#Let's sum the columns of both data frames in the alpha list
lapply(lst[['alpha']], colSums)
$a
X1 X2 
 3  7 

$b
X1 X2 X3 
13 17 21 

Let's try to find the sum of each column of each data frame:

lapply(lst, colSums)
Error in FUN(X[[i]], ...) : 
  'x' must be an array of at least two dimensions

What happened? R is correctly refusing to run an array function on a list. The function colSums needs to be fed data frames, matrices, and other arrays above one-dimension. We have to nest an lapply function inside of another one. The logic can get complicated:

lapply(lst, function(x) lapply(x, colSums))
$alpha
$alpha$a
X1 X2 
 3  7 

$alpha$b
X1 X2 X3 
13 17 21 


$beta
$beta$c
X1 X2 
23 27 

We can use rbind to put data.frames together:

rbind(lst$alpha$a, lst$beta$c)
  X1 X2
1  1  3
2  2  4
3 11 13
4 12 14

Be sure not to do it the way you might be thinking (I've done it many times):

do.call(rbind, lst)
      a      b     
alpha List,2 List,3
beta  List,2 List,2

That isn't the result you're looking for. And make sure that the dimensions and column names are the same:

do.call(rbind, lst[[1]])
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

R is refusing to combine data frames that have 2 columns in one (alpha$a) and three columns in the other (alpha$b).

I changed the lst to make alpha$b have two columns like the others and combined them:

bind1 <- lapply(lst2, function(x) do.call(rbind, x))
bind1
$alpha
    X1 X2
a.1  1  3
a.2  2  4
b.1  6  9
b.2  7 10
b.3  8 11

$beta
    X1 X2
c.1 11 13
c.2 12 14

That combines the elements of each list. Now I can combine the outer list to make one big data frame.

do.call(rbind, bind1)
          X1 X2
alpha.a.1  1  3
alpha.a.2  2  4
alpha.b.1  6  9
alpha.b.2  7 10
alpha.b.3  8 11
beta.c.1  11 13
beta.c.2  12 14
0
baptiste On

Here's a strategy based on melting a list (recursively),

lst = list(alpha= list(a= data.frame(matrix(1:4, 2)), 
                       b= data.frame(matrix(6:11, 2))), 
           beta = list(c = data.frame(matrix(11:14, 2))))

library(reshape2)
m = melt(lst, id=1:2)
library(ggplot2)
ggplot(m, aes(X1,X2)) + geom_bar(stat="identity") + facet_grid(L1~L2)
0
josliber On

I find nested lists to be pretty tricky to work with, so I would combine them all into a single data frame that labels the name of the variable and the name of the metric:

lst <- list(alpha= list(a= data.frame(matrix(1:4, 2)), b= data.frame(matrix(6:9, 2))), beta = list(c = data.frame(matrix(11:14, 2))))
level1 <- lapply(lst, function(x) do.call(rbind, lapply(names(x), function(y) {x[[y]]$metric=y ; x[[y]]})))
dat <- do.call(rbind, lapply(names(level1), function(x) {level1[[x]]$variable=x ; level1[[x]]}))
dat
#   X1 X2 metric variable
# 1  1  3      a    alpha
# 2  2  4      a    alpha
# 3  6  8      b    alpha
# 4  7  9      b    alpha
# 5 11 13      c     beta
# 6 12 14      c     beta

Now you can use standard tools for manipulating a single data frame to perform your data analysis.