I'd just like to have a table that contains the ID and a new categorical variable (called e.g. Actual_landcover, having 40 categories) which is derived from the variables LandcoverXXXX and Sub_landcoverXXXX.
I have a dataset that contains variables
ID,
Year (2014-2023),
Landcover2013 (categorical variable 1-4),
Landcover2015,
Landcover2017,
Landcover2019,
Landcover2021,
Sub_landcover2013 (subcategories 1-10),
Sub_landcover2015,
Sub_landcover2017,
Sub_landcover2019,
Sub_landcover2021
I'd like to add a new column in my dataset. The new column would contain a new categorical variable describing the actual landcover (4 x 10 = 40, because of all those categories and subcategories the values can range from 1 to 40, in theory). And because there is temporal variation in the landcover values, I want the clause to utilize the newest data available based on the variable 'Year'. For example, if the 'Year' is 2014 the data used for a new value is from variables 'Landcover2013' and 'Sub_landcover2013'. And for example, with the 'Year' 2015 I want to use the same data from 2013 because the landcover data is published at the end of those years mentioned in the existing column names.
df <- data.frame(ID = c("1", "2", "3", "4", "5"),
Year = c("2014", "2014", "2016", "2017", "2023"),
Landcover2013 = c("1", "1", "2", "1", "4"),
Landcover2015 = c("1", "1", "3", "2", "4"),
Landcover2017 = c("1", "1", "2", "2", "3"),
Landcover2019 = c("2", "1", "2", "2", "4"),
Landcover2021 = c("2", "1", "3", "1", "4"),
Sub_landcover2013 = c("4", "7", "5", "9", "1"),
Sub_landcover2015 = c("5", "7", "6", "9", "2"),
Sub_landcover2017 = c("4", "6", "6", "9", "1"),
Sub_landcover2019 = c("4", "6", "6", "9", "2"),
Sub_landcover2021 = c("4", "6", "6", "10", "1"))
The following two tables are just examples of rules which could derive new values. The new values could be formed like:
LandcoverXXXX Sub_landcoverXXXX New_value
1 1 1 1
2 1 2 2
3 1 3 3
4 1 4 4
5 1 5 5
6 1 6 6
7 1 7 7
8 1 8 8
9 1 9 9
10 1 10 10
11 2 1 11
12 2 2 12
13 2 3 13
14 2 4 14
15 2 5 15
16 2 6 16
17 2 7 17
18 2 8 18
19 2 9 19
20 2 10 20
21 3 1 21
and so on...
OR like:
LandcoverXXXX Sub_landcoverXXXX New_value
1 1 1 11
2 1 2 12
3 1 3 13
4 1 4 14
5 1 5 15
6 1 6 16
7 1 7 17
8 1 8 18
9 1 9 19
10 1 10 110
11 2 1 21
12 2 2 22
13 2 3 23
14 2 4 24
15 2 5 25
16 2 6 26
17 2 7 27
18 2 8 28
19 2 9 29
20 2 10 210
21 3 1 31
and so on...
But those two tables are just examples of how the new values could be formed. The final output table could look like this:
ID Actual_landcover
1 1 14
2 2 17
3 3 36
4 4 29
5 5 41
Edit: This answer was posted prior to the OP's update and is for replicating the pattern highlighted in the data image only.
Using
tidyversepackages, here's a way to achieve your goal. The workflow:pivot_longer()slice()each group by the maximum year in "group1" column that is less than the value in the "Year" columnpivot_wider()to get two separate landcover type columns