I am trying to convert columns that are one-hot encoded from a multiple-choice (multiple answers can be true) questionnaire to only one column, without affecting the other variables. I found similar threads that convert a one-hot encoded variable that is binary (only one column is 1, the others are 0).
Example data set:
df <- data.frame(
`ID` = c(01,02,03),
`Context_Family` = c(0,1,0),
`Context_Friends` = c(1,1,0),
`Context_Spouse` = c(0,1,0),
`Context_Alone` = c(0,0,1),
`Disposition_Stress` = c(0,1,0),
`Disposition_Melancholic` = c(1,1,0),
Stress = c(20,24,35)
)
What I want is a data frame of the form:
ID | Context | Disposition | Stress
1 | Friends | Melancholic | 20
2 | Family, Friends, Spouse | Stress, Melancholic | 24
3 | Alone | Not stressed nor melancholic | 35
In other words, I want to have one column for Context, one for Disposition. If all values are '0' for the disposition or context for example, I want this to be captured (e.g., ID 3, not stressed nor melancholic), otherwise just put the variables that are '1' (e.g., ID 2, he has family, friends and spouse under 'Context' as he put 1 to those columns in the original data set). The other column 'stress' which is good is not changed.
The code should be adaptable to many more columns that have this format like Context_ and Disposition_, I put here a simplification. In pseudocode it would be nice to have a function or something like that which can basically be:
Give me the column names that are hot-encoded: "Context_", "Disposition_"
For these columns, I will make two columns (in this case) e.g., "Context", "Disposition"
If an observation (row) does not have 1 to any of them, I will simply take all the different column names and say for example for "Context" -> "Not stressed nor melancholic"
I tried doing this with dplyr but with no success so far.
We can get the data in long format (
pivot_longer) bringingContextandDispositionin columns of it's own then for eachIDsummarisethe data in one row by giving the values only for those whose value is 1.You may then
replacethe emptyDispositionvalue ("") to"Not stressed nor melancholic"by adding the following -