I have a quite complex data frame structure:
ID = c(1,2,3)
Sessions = c("2023-11-14 19:01:39+01:00", "2023-11-14 20:01:39+01:00", "2023-11-14 21:01:39+01:00")
P_affect = c(10,20,30)
N_affect = c(15,30,40)
NMeals = c(0,1,2)
Meal1_Where_Home = c(NA, 1, 0)
Meal1_Where_Restaurant = c(NA, 0, 1)
Meal1_Who_Alone = c(NA, 1, 0)
Meal1_Who_Friends = c(NA, 0 , 1 )
Meal1_Type_Big_Meal = c(NA, 1, 1)
Meal1_Type_Small_Meal = c(NA, 0, 0)
Meal2_Where_Home = c(NA, NA, 1)
Meal2_Where_Restaurant = c(NA, NA, 0)
Meal2_Who_Alone = c(NA, NA, 1)
Meal2_Who_Friends = c(NA, NA , 0 )
Meal2_Type_Big_Meal = c(NA, NA, 1)
Meal2_Type_Small_Meal = c(NA, NA, 0)
Meal3_Where_Home = c(NA, NA, NA)
Meal3_Where_Restaurant = c(NA, NA, NA)
Meal3_Who_Alone = c(NA, NA, NA)
Meal3_Who_Friends = c(NA, NA , NA )
Meal3_Type_Big_Meal = c(NA, NA, NA)
Meal3_Type_Small_Meal = c(NA, NA, NA)
# Create a data frame
df1 <- data.frame(ID, Sessions, P_affect, N_affect, NMeals, Meal1_Where_Home, Meal1_Where_Restaurant,
Meal1_Who_Alone, Meal1_Who_Friends, Meal1_Type_Big_Meal, Meal1_Type_Small_Meal,
Meal2_Where_Home, Meal2_Where_Restaurant, Meal2_Who_Alone, Meal2_Who_Friends,
Meal2_Type_Big_Meal, Meal2_Type_Small_Meal, Meal3_Where_Home, Meal3_Where_Restaurant,
Meal3_Who_Alone, Meal3_Who_Friends, Meal3_Type_Big_Meal, Meal3_Type_Small_Meal)
df2 <- data.frame(
`ID` = c(1,2,3),
`Context_Family` = c(0,1,0),
`Context_Friends` = c(1,1,0),
`Context_Spouse` = c(0,1,0),
`Context_Alone` = c(0,0,1),
`Disposition_Stress` = c(0,1,0),
`Disposition_Melancholic` = c(1,1,0),
Stress = c(20,24,35)
)
df = merge(df1,df2, by = 'ID')
What I would like to is essentially two steps:
- convert all the columns of a specific type (that start with "Context_" or "Disposition_") into their not hot-encoded form
- make the data set long format by meal number
Desired output:
ID | Sessions | P_affect | N_affect | NMeals | MealNumber | MealObs | MealValue | Context | Disposition
1 | 2023-11-14 19:01:39 | 10 | 15 | 0 | Meal1 | Where | NA | Friends | Melancholic
1 | 2023-11-14 19:01:39 | 10 | 15 | 0 | Meal1 | Who | NA | Friends | Melancholic
I tried for step1:
df_modified = df %>%
pivot_longer(col=starts_with("Context"), names_to="Context", names_prefix="Context_") %>%
filter(value==1) %>%
select(-value)
But this did not work very well, and also I would like an approach that just asks for column names and does the hot-encoding transformation for all of them, instead taking one by one. And for the long formatting:
data_long <- df %>%
pivot_longer(cols = starts_with("Meal"),
names_to = c("Meal Number", "Value"),
names_sep = "_",
values_to = "value")
Which works, but on a data set without hot-encoded values. I included a larger data frame just to check if the code works properly for all cases.
Your
df
dataset isn´t tidy because there´s more than one value per variable/observation. ForID == 2
,Context
is "Family", "Friends" and "Spouse". You´ll getlist-columns
:Pivoting:
EDIT: A alternative for list-columns is to
summarise
before thepivot_wider
step: