I am trying to do some advanced data manipulation here.
The data below has age_group categories in character format. I also have age_division which is a number, we get this by dividing the age_group interval over the total_estimate for that age_group. For example, age_group = '< 5 yrs' has an estimate of 20 and we divide that by 5 which is 10. So, I am trying to have each single year age shown per age group in equal units. For age_group '< 5 yrs', want to start from 0 yrs and for age_group ' > 85 yrs' we consider the maximum age to be 90 yrs.
Current data:
age_group age_division total_estimate
<chr> <dbl> <dbl>
1 < 5 yrs 2 10
2 < 5 yrs 2 10
3 < 5 yrs 2 10
4 < 5 yrs 2 10
5 < 5 yrs 2 10
6 < 5 yrs 2 10
7 < 5 yrs 2 10
8 < 5 yrs 2 10
9 < 5 yrs 2 10
10 < 5 yrs 2 10
11 5_9 yrs 4 20
12 5_9 yrs 4 20
13 5_9 yrs 4 20
14 5_9 yrs 4 20
15 5_9 yrs 4 20
16 5_9 yrs 4 20
17 5_9 yrs 4 20
18 5_9 yrs 4 20
19 5_9 yrs 4 20
20 5_9 yrs 4 20
21 5_9 yrs 4 20
22 5_9 yrs 4 20
23 5_9 yrs 4 20
24 5_9 yrs 4 20
25 5_9 yrs 4 20
26 5_9 yrs 4 20
27 5_9 yrs 4 20
28 5_9 yrs 4 20
29 5_9 yrs 4 20
30 5_9 yrs 4 20
31 18_19 yrs 2 4
31 18_19 yrs 2 4
32 18_19 yrs 2 4
33 18_19 yrs 2 4
33 >85 yrs 5 25
34 >85 yrs 5 25
35 >85 yrs 5 25
36 >85 yrs 5 25
37 >85 yrs 5 25
38 >85 yrs 5 25
39 >85 yrs 5 25
40 >85 yrs 5 25
41 >85 yrs 5 25
42 >85 yrs 5 25
43 >85 yrs 5 25
44 >85 yrs 5 25
45 >85 yrs 5 25
46 >85 yrs 5 25
47 >85 yrs 5 25
48 >85 yrs 5 25
49 >85 yrs 5 25
50 >85 yrs 5 25
51 >85 yrs 5 25
52 >85 yrs 5 25
53 >85 yrs 5 25
54 >85 yrs 5 25
55 >85 yrs 5 25
56 >85 yrs 5 25
57 >85 yrs 5 25
Expected output:
age_group age_division age_single
<chr> <dbl> <dbl>
1 < 5 yrs 2 0
2 < 5 yrs 2 0
3 < 5 yrs 2 1
4 < 5 yrs 2 1
5 < 5 yrs 2 2
6 < 5 yrs 2 2
7 < 5 yrs 2 3
8 < 5 yrs 2 3
9 < 5 yrs 2 4
10 < 5 yrs 2 4
11 5_9 yrs 4 5
12 5_9 yrs 4 5
13 5_9 yrs 4 5
14 5_9 yrs 4 5
15 5_9 yrs 4 6
16 5_9 yrs 4 6
17 5_9 yrs 4 6
18 5_9 yrs 4 6
19 5_9 yrs 4 7
20 5_9 yrs 4 7
21 5_9 yrs 4 7
22 5_9 yrs 4 7
23 5_9 yrs 4 8
24 5_9 yrs 4 8
25 5_9 yrs 4 8
26 5_9 yrs 4 8
27 5_9 yrs 4 9
28 5_9 yrs 4 9
29 5_9 yrs 4 9
30 5_9 yrs 4 9
31 18_19 yrs 2 18
31 18_19 yrs 2 18
32 18_19 yrs 2 19
33 18_19 yrs 2 19
33 >85 yrs 5 85
34 >85 yrs 5 85
35 >85 yrs 5 85
36 >85 yrs 5 85
37 >85 yrs 5 85
38 >85 yrs 5 86
39 >85 yrs 5 86
40 >85 yrs 5 86
41 >85 yrs 5 86
42 >85 yrs 5 86
43 >85 yrs 5 87
44 >85 yrs 5 87
45 >85 yrs 5 87
46 >85 yrs 5 87
47 >85 yrs 5 87
48 >85 yrs 5 88
49 >85 yrs 5 88
50 >85 yrs 5 88
51 >85 yrs 5 88
52 >85 yrs 5 88
53 >85 yrs 5 89
54 >85 yrs 5 89
55 >85 yrs 5 89
56 >85 yrs 5 89
57 >85 yrs 5 89
dput of the dataframe:
structure(list(estimate = c(20, 20, 20, 20, 20, 20), age_group = c("< 5 yrs",
"< 5 yrs", "< 5 yrs", "< 5 yrs", "< 5 yrs", "< 5 yrs"), age_division = c(4,
4, 4, 4, 4, 4)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Any help with this is much appreciated.
what about:
note that
age divisionandtotal estimateare calculated on the fly, no need to supply them with the input data (and probably no need to keep them in the output)