Is there a way to create even single year age from the groups based on a weight?

68 views Asked by At

I am trying to do some advanced data manipulation here.

The data below has age_group categories in character format. I also have age_division which is a number, we get this by dividing the age_group interval over the total_estimate for that age_group. For example, age_group = '< 5 yrs' has an estimate of 20 and we divide that by 5 which is 10. So, I am trying to have each single year age shown per age group in equal units. For age_group '< 5 yrs', want to start from 0 yrs and for age_group ' > 85 yrs' we consider the maximum age to be 90 yrs.

Current data:

 age_group age_division    total_estimate
   <chr>            <dbl>       <dbl>
1 < 5 yrs              2         10
2 < 5 yrs              2         10
3 < 5 yrs              2         10
4 < 5 yrs              2         10
5 < 5 yrs              2         10
6 < 5 yrs              2         10
7 < 5 yrs              2         10
8 < 5 yrs              2         10
9 < 5 yrs              2         10
10 < 5 yrs             2         10
11 5_9 yrs             4         20
12 5_9 yrs             4         20
13 5_9 yrs             4         20
14 5_9 yrs             4         20
15 5_9 yrs             4         20
16 5_9 yrs             4         20
17 5_9 yrs             4         20
18 5_9 yrs             4         20
19 5_9 yrs             4         20
20 5_9 yrs             4         20
21 5_9 yrs             4         20
22 5_9 yrs             4         20
23 5_9 yrs             4         20
24 5_9 yrs             4         20
25 5_9 yrs             4         20
26 5_9 yrs             4         20
27 5_9 yrs             4         20
28 5_9 yrs             4         20
29 5_9 yrs             4         20
30 5_9 yrs             4         20
31 18_19 yrs           2          4
31 18_19 yrs           2          4
32 18_19 yrs           2          4
33 18_19 yrs           2          4
33 >85 yrs             5         25
34 >85 yrs             5         25
35 >85 yrs             5         25
36 >85 yrs             5         25
37 >85 yrs             5         25
38 >85 yrs             5         25
39 >85 yrs             5         25
40 >85 yrs             5         25
41 >85 yrs             5         25
42 >85 yrs             5         25
43 >85 yrs             5         25
44 >85 yrs             5         25
45 >85 yrs             5         25
46 >85 yrs             5         25
47 >85 yrs             5         25
48 >85 yrs             5         25
49 >85 yrs             5         25
50 >85 yrs             5         25
51 >85 yrs             5         25
52 >85 yrs             5         25
53 >85 yrs             5         25
54 >85 yrs             5         25
55 >85 yrs             5         25
56 >85 yrs             5         25
57 >85 yrs             5         25

Expected output:

 age_group age_division      age_single
   <chr>            <dbl>        <dbl>  
1 < 5 yrs              2           0
2 < 5 yrs              2           0    
3 < 5 yrs              2           1
4 < 5 yrs              2           1
5 < 5 yrs              2           2
6 < 5 yrs              2           2
7 < 5 yrs              2           3
8 < 5 yrs              2           3
9 < 5 yrs              2           4
10 < 5 yrs             2           4
11 5_9 yrs             4           5
12 5_9 yrs             4           5
13 5_9 yrs             4           5
14 5_9 yrs             4           5
15 5_9 yrs             4           6
16 5_9 yrs             4           6
17 5_9 yrs             4           6
18 5_9 yrs             4           6
19 5_9 yrs             4           7
20 5_9 yrs             4           7
21 5_9 yrs             4           7
22 5_9 yrs             4           7
23 5_9 yrs             4           8 
24 5_9 yrs             4           8
25 5_9 yrs             4           8
26 5_9 yrs             4           8
27 5_9 yrs             4           9
28 5_9 yrs             4           9
29 5_9 yrs             4           9
30 5_9 yrs             4           9 
31 18_19 yrs           2          18
31 18_19 yrs           2          18
32 18_19 yrs           2          19
33 18_19 yrs           2          19
33 >85 yrs             5          85
34 >85 yrs             5          85
35 >85 yrs             5          85
36 >85 yrs             5          85
37 >85 yrs             5          85
38 >85 yrs             5          86
39 >85 yrs             5          86
40 >85 yrs             5          86
41 >85 yrs             5          86
42 >85 yrs             5          86
43 >85 yrs             5          87
44 >85 yrs             5          87
45 >85 yrs             5          87
46 >85 yrs             5          87
47 >85 yrs             5          87
48 >85 yrs             5          88
49 >85 yrs             5          88
50 >85 yrs             5          88
51 >85 yrs             5          88
52 >85 yrs             5          88
53 >85 yrs             5          89 
54 >85 yrs             5          89
55 >85 yrs             5          89
56 >85 yrs             5          89
57 >85 yrs             5          89

dput of the dataframe:

structure(list(estimate = c(20, 20, 20, 20, 20, 20), age_group = c("< 5 yrs", 
"< 5 yrs", "< 5 yrs", "< 5 yrs", "< 5 yrs", "< 5 yrs"), age_division = c(4, 
4, 4, 4, 4, 4)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

Any help with this is much appreciated.

1

There are 1 answers

1
I_O On

what about:

library(dplyr)
library(tidyr)

age_lims <- list("< 5 yrs" = c(0, 4),
                   "5_9 y rs" = c(5, 9), ## note typo
                   "18_19yrs" = c(18, 19),
                   ">85 yrs" = c(85, 90)
                   )

d |>
  select(age_group) |> ## only need age group
  count(age_group) |>
  reframe(total_estimate = n,
          lims = age_lims[age_group],
          age_division = 1 + dist(range(lims)),
          age_single = rep(lims[[1]][1] : lims[[1]][2], 
                           each = age_division
                           ),
          .by = age_group                     
          ) |>
  select(starts_with('age_'))
##    age_group age_division age_single
## 1   18_19yrs            2         18
## 2   18_19yrs            2         18
## 3   18_19yrs            2         19
## 4   18_19yrs            2         19
## 5   5_9 y rs            5          5
## 6   5_9 y rs            5          5
## 7   5_9 y rs            5          5
## 8   5_9 y rs            5          5
## 9   5_9 y rs            5          5
## 10  5_9 y rs            5          6
## [truncated]

note that

  • I used your data initially supplied
  • age division and total estimate are calculated on the fly, no need to supply them with the input data (and probably no need to keep them in the output)