Is there a way in R to select first instance of columns and arrange a list vertically?

45 views Asked by At

My current data structure is like this.

 age_group age_division    total_estimate zipcode  single_age
   <chr>            <dbl>       <dbl>      <dbl>      <list>    
1   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
2   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
3   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
4   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
5   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
6   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
7   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
8   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
9   0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
10  0-4              2         10        51557   list(C(0, 0, 1, 1, 2, 2, 3, 3, 4, 4))
11  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
12  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
13  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
14  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
15  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
16  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
17  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
18  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
19  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
20  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
21  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
22  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
23  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
24  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
25  0-4              3         15        51558   list(C(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4))
26  85-90            1         8         51559   list(C(0, 1, 2, 3, 4))
27  85-90            1         8         51559   list(C(0, 1, 2, 3, 4))
28  85-90            1         8         51559   list(C(0, 1, 2, 3, 4))
29  85-90            1         8         51559   list(C(0, 1, 2, 3, 4))
30  85-90            1         8         51559   list(C(0, 1, 2, 3, 4))
31  85-90            1         8         51559   list(C(0, 1, 2, 3, 4))

For every first unique zipcode and age_group, I want to expand the corresponding first single_age list vertically. For example, if the first unique zipcode is 1, age_group is 0-4 and age_division is 2. Then the list should be in a vertical format where we have 0, 0, 1, 1, 2, 2, 3, 3, 4, 4. This is my code where it is giving me single_age as a list.

# Function to expand age groups
expand_age <- function(age_group, age_division) {
  age <- rep(seq(as.numeric(strsplit(age_group, "-")[[1]][1]), 
                 as.numeric(strsplit(age_group, "-")[[1]][2])), 
             each = age_division)
  return(list(age))
}

# Apply the function to each row of the dataframe using mutate
df <- df_1 %>%
  group_by(zipcode) %>%
  mutate(single_age = list(expand_age(first(age_group), first(age_division))))

This is what the expected output should look like.

age_group age_division    total_estimate zipcode  single_age
   <chr>            <dbl>       <dbl>      <dbl>      <dbl>    
1   0-4              2         10        51557          0
2   0-4              2         10        51557          0
3   0-4              2         10        51557          1
4   0-4              2         10        51557          1
5   0-4              2         10        51557          2
6   0-4              2         10        51557          2
7   0-4              2         10        51557          3
8   0-4              2         10        51557          3
9   0-4              2         10        51557          4
10  0-4              2         10        51557          4
11  0-4              3         15        51558          0
12  0-4              3         15        51558          0
13  0-4              3         15        51558          0
14  0-4              3         15        51558          1
15  0-4              3         15        51558          1
16  0-4              3         15        51558          1
17  0-4              3         15        51558          2
18  0-4              3         15        51558          2
19  0-4              3         15        51558          2
20  0-4              3         15        51558          3
21  0-4              3         15        51558          3
22  0-4              3         15        51558          3
23  0-4              3         15        51558          4
24  0-4              3         15        51558          4
25  0-4              3         15        51558          4
26  85-90            1         8         51559         85
27  85-90            1         8         51559         86
28  85-90            1         8         51559         87
29  85-90            1         8         51559         88
30  85-90            1         8         51559         89
31  85-90            1         8         51559         90
2

There are 2 answers

2
Adriano Mello On

It's always better to share your data with dput, as @Onyambu asked.

With the given format, unnest_longer(df_1, single_age) will not work because:

  1. The lists inside single_age have just one object - a integer vector;
  2. C() (with capital "C") is different from c().

You can try this:

library(tidyverse)

# Data with c() instead of C()
aux <- tribble(
~age_group, ~age_division,    ~total_estimate, ~zipcode,  ~single_age,
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           2,         10,        51557,   list(c(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"0-4",           3,         15,        51558,   list(c(0, 0 ,0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
"85-90",            1,         8  ,       51559,   list(c(0, 1, 2, 3, 4)),
"85-90",            1,         8  ,       51559,   list(c(0, 1, 2, 3, 4)),
"85-90",            1,         8  ,       51559,   list(c(0, 1, 2, 3, 4)),
"85-90",            1,         8  ,       51559,   list(c(0, 1, 2, 3, 4)),
"85-90",            1,         8  ,       51559,   list(c(0, 1, 2, 3, 4)),
"85-90",            1,         8  ,       51559,   list(c(0, 1, 2, 3, 4))
)

Before unnest, simplify it:

aux <- aux %>% 
  distinct(age_group, zipcode, .keep_all = TRUE) %>% 
  mutate(single_age = list_simplify(single_age)) %>% 
  unnest_longer(single_age)

The output:

> aux
# A tibble: 30 × 5
   age_group age_division total_estimate zipcode single_age
   <chr>            <dbl>          <dbl>   <dbl>      <dbl>
 1 0-4                  2             10   51557          0
 2 0-4                  2             10   51557          0
 3 0-4                  2             10   51557          1
 4 0-4                  2             10   51557          1
 5 0-4                  2             10   51557          2
 6 0-4                  2             10   51557          2
 7 0-4                  2             10   51557          3
 8 0-4                  2             10   51557          3
 9 0-4                  2             10   51557          4
10 0-4                  2             10   51557          4
11 0-4                  3             15   51558          0
12 0-4                  3             15   51558          0
13 0-4                  3             15   51558          0
14 0-4                  3             15   51558          1
15 0-4                  3             15   51558          1
16 0-4                  3             15   51558          1
17 0-4                  3             15   51558          2
18 0-4                  3             15   51558          2
19 0-4                  3             15   51558          2
20 0-4                  3             15   51558          3
21 0-4                  3             15   51558          3
22 0-4                  3             15   51558          3
23 0-4                  3             15   51558          4
24 0-4                  3             15   51558          4
25 0-4                  3             15   51558          4
26 85-90                1              8   51559          0
27 85-90                1              8   51559          1
28 85-90                1              8   51559          2
29 85-90                1              8   51559          3
30 85-90                1              8   51559          4
0
Onyambu On

Here is the code using the data from the other answer:

library(tidyverse)
unnest_longer(unnest_longer(unique(df), single_age), single_age)

# A tibble: 30 × 5
   age_group age_division total_estimate zipcode single_age
   <chr>            <dbl>          <dbl>   <dbl>      <dbl>
 1 0-4                  2             10   51557          0
 2 0-4                  2             10   51557          0
 3 0-4                  2             10   51557          1
 4 0-4                  2             10   51557          1
 5 0-4                  2             10   51557          2
 6 0-4                  2             10   51557          2
 7 0-4                  2             10   51557          3
 8 0-4                  2             10   51557          3
 9 0-4                  2             10   51557          4
10 0-4                  2             10   51557          4
# ℹ 20 more rows
# ℹ Use `print(n = ...)` to see more rows