Convert list of overlapping data.frames into single data.frame

50 views Asked by At

I have some population information from multiple cohorts in a list. Each cohort covers an overlapping time period. The data looks like the following:

> raw.data
$`1`
  Year     Pop
1 1920 1927433
2 1921 1914551
3 1922 1900776

$`2`
  Year     Pop
1 1921 1915576
2 1922 1902075
3 1923 1887613

$`3`
  Year     Pop
1 1922 1902111
2 1923 1887862
3 1924 1872695

I would like to convert it into a single data frame where the column names are the years and the diagonals are population data. The output should look like the following:

> resulting.data
     1920    1921    1922    1923    1924
1 1927433 1915576 1902111      NA      NA
2      NA 1914551 1902075 1887862      NA
3      NA      NA 1900776 1887613 1872695

You can find example data for the input and desired output below:

raw.data <- structure(list(`1` = structure(list(Year = 1920:1922, Pop = c(1927433L, 1914551L, 1900776L)), .Names = c("Year", "Pop"), row.names = c(NA, 3L), class = "data.frame"), `2` = structure(list(Year = 1921:1923, Pop = c(1915576L, 1902075L, 1887613L)), .Names = c("Year", "Pop"), row.names = c(NA, 3L), class = "data.frame"), `3` = structure(list(Year = 1922:1924, Pop = c(1902111L, 1887862L, 1872695L)), .Names = c("Year", "Pop"), row.names = c(NA, 3L), class = "data.frame")), .Names = c("1", "2", "3"))
resulting.data <- structure(list(X1920 = c(1927433, NA, NA), X1921 = c(1915576, 1914551, NA), X1922 = c(1902111, 1902075, 1900776), X1923 = c(NA, 1887862, 1887613), X1924 = c(NA, NA, 1872695)), .Names = c("X1920", "X1921", "X1922", "X1923", "X1924"), row.names = c(NA, -3L), class = "data.frame")

I have looked at this link which offers a similar question but I was unsuccessful in adapting it to my needs. I have also attempted to use plyr to try to obtain the diagonals first and then combine them but then I'm not sure how to do the combining.

1

There are 1 answers

3
Andrie On BEST ANSWER

Use do.call() with rbind() to convert the data to a single data frame, then reshape2::dcast() for the reshaping:

dat <- do.call(rbind, raw.data)
dat$obs <- gsub(".*?\\.", "", row.names(dat))

library(reshape2)
dcast(dat, obs ~ Year, fun.aggregate = sum, value.var = "Pop")

     obs    1920    1921    1922    1923    1924
1   1 1927433 1915576 1902111       0       0
2   2       0 1914551 1902075 1887862       0
3   3       0       0 1900776 1887613 1872695