I'm trying to create a wide format dataset that would include only some of the long format data. This is data from learners going through an online learning module in which they sometimes get "stuck" in a screen, therefore have multiple attempts recorded for that screen.
lesson_long <- data.frame (id  = c(4256279, 4256279, 4256279, 4256279, 4256279, 4256279, 4256308, 4256308, 4256308, 4256308),
                           screen = c("survey1", "survey1", "survey1", "survey1", "survey2", "survey2", "survey1", "survey1", "survey2", "survey2"),
                           question_attempt = c(1, 1, 2, 2, 1, 1, 1, 1, 1, 1),
                           variable = c("age", "country", "age", "country", "education", "course", "age", "country", "education", "course"),
                           response = c(0, 5, 20, 5, 3, 2, 18, 5, 4, 1 ))
.
id       screen     question_attempt variable response
4256279  survey1            1           age       0
4256279  survey1            1         country     5
4256279  survey1            2           age       20
4256279  survey1            2         country     5
4256279  survey2            1        education    3
4256279  survey2            1         course      2
4256308  survey1            1           age       18
4256308  survey1            1         country     5
4256308  survey2            1        education    4
4256308  survey2            1         course      1
For my analyses I need to include only their response in their last attempt in each screen (or response on their max question_attempt - sometimes they have up to 8 or 9 attempts in each screen). All previous attempts will be dismissed and I don't need to have the screen name in the final dataset. The final wide format would look like this:
id        age  country education course
4256279   20     5         3         2
4256308   18     5         4         1
I've been trying to do this with just dcast (unsuccessfully):
lesson_wide <- dcast(lesson_long, `id` ~ variable, value.var = "response", fun.aggregate = max("question_attempt"), fill=0)
The fun.aggregate is obviously not working as I made it up... But is there a solution for this? Or perhaps I need an additional step to select the data before using dcast? But how would do this if that's the solution?
Curious to see your answers. Thanks in advance!
 
                        
You can
orderthe data byid,screenandquestion_attemptand select thelastvalue of eachquestion_attempt.Similarly, using
dplyrandtidyr: