I have the following table in R which lists a person race, gender, age, and cholesterol test. age and cholesterol test are displayed as dummy variables. age can be categorized as low, medium, or high, while cholesterol tests can be categorized as low or high. I want to transform the age and cholesterol columns to be single columns where low is categorized as 1, medium is categorized as 2, and high is categorized as 3. Cholesterol test can be neigh low or high if a person never took one and should be N/A in the expected output. I want the solution to be dynamic so that if I have multiple columns in this format, the code would still work (i.e. there may be some new tests, which can be categorized as high, low, or medium as dummy variables).
How can I do this in R?
input:
race gender age.low_tm1 age.medium_tm1 age.high_tm1 chol_test.low_tm1 chol_test.high_tm1
<chr> <int> <int> <int> <int> <int> <int>
1 white 0 1 0 0 0 0
2 white 0 1 0 0 0 0
3 white 1 1 0 0 0 0
4 black 1 0 1 0 0 0
5 white 0 0 0 1 0 1
6 black 0 0 1 0 1 0
expected output:
race gender age chol_test
1 white 0 1 n/a
2 white 0 1 n/a
3 white 1 1 n/a
4 black 1 2 n/a
5 white 0 3 3
6 black 0 2 1
We could first define a custom function that allows us to recode dummy variables based on their variable names, below called
var_nm2value.This function takes the values of the variables as
xargument. Indplyr::acrossthis is the.xpart. And it takes a list of name-value pairs asvalue_lsargument. The function just loops over the list of name-value pairs, checks if the name invalue_lsis found in the variable name. To do this it usesgreplondplyr::cur_column(). If we have a match then we replace all1s with the value from ourvalue_lsand we return all other values, that is the zeros, as is.Then we can define a list of recode values, below
recode_ls.Finally, we use
purrr::map_dfcin adplyr::summarisewhere we use the variable strings we want to create"age"and"chol_test", then ii)selectonly columns which contain this string, and in each iteration we iii) applydplyr::acrossto recode the values, iv) pipe the result in ado.callto get themaxand finally v) recode0s toNA:Created on 2022-01-03 by the reprex package (v0.3.0)