I have a big dataset, with 240 cases representing 240 patients. They all have undergone neuropsychological tests and filled in questionnaires. Additionally, their significant others (hereafter: proxies) have also filled in questionnaires. Since 'patient' and 'proxy' are nested in 'couples', I want to conduct a multilevel analysis in R. For this, I need to reshape my dataset to run those kind of analysis.
Simply said, I want to 'duplicate' my rows. For the double subject IDs add a new variable with 1s and 2s, where 1 stands for patient data and 2 stands for proxy data. Then I want the rows to be filled with 1. all the patient data and the columns that contain the proxy data to be NA or empty or whatever, and 2. all the proxy data, and all the patient data NA or empty.
Let's say this is my data:
id <- c(1:5)
names <- c('id', 'p1', 'p2', 'p3', 'pr1', 'pr2', 'pr3')
p1 <- c(sample(1:10, 5))
p2 <- c(sample(10:20, 5))
p3 <- c(sample(20:30, 5))
pr1 <- c(sample(1:10, 5))
pr2 <- c(sample(10:20, 5))
pr3 <- c(sample(20:30, 5))
mydf <- as.data.frame(matrix(c(id, p1, p2, p3, pr1, pr2, pr3), nrow = 5))
colnames(mydf) <- names
>mydf
id p1 p2 p3 pr1 pr2 pr3
1 1 6 20 22 1 10 24
2 2 8 11 24 2 18 29
3 3 7 10 25 6 20 26
4 4 3 14 20 10 15 20
5 5 5 19 29 7 14 22
I want my data finally to look like this:
id2 <- rep(c(1:5), each = 2)
names2 <- c('id', 'couple', 'q1', 'q2', 'q3')
couple <- rep(1:2, 5)
p1 <- c(sample(1:10, 5))
p2 <- c(sample(10:20, 5))
p3 <- c(sample(20:30, 5))
pr1 <- c(sample(1:10, 5))
pr2 <- c(sample(10:20, 5))
pr3 <- c(sample(20:30, 5))
mydf <- as.data.frame(matrix(c(id2, couple, p1, p2, p3, pr1, pr2, pr3), nrow = 10, ncol = 5))
colnames(mydf) <- names2
>mydf
id couple q1 q2 q3
1 1 1 6 23 16
2 1 2 10 28 10
3 2 1 1 27 14
4 2 2 7 21 20
5 3 1 5 30 18
6 3 2 12 2 27
7 4 1 10 1 25
8 4 2 13 7 21
9 5 1 11 6 20
10 5 2 18 3 23
Or, if this is not possible, like this:
id couple bb1 bb2 bb3 pbb1 pbb2 pbb3
1 1 1 6 23 16
2 1 2 10 28 10
3 2 1 1 27 14
4 2 2 7 21 20
5 3 1 5 30 18
6 3 2 12 2 27
7 4 1 10 1 25
8 4 2 13 7 21
9 5 1 11 6 20
10 5 2 18 3 23
Now, to get me there, i've tried the melt() function and the gather() function and it feels like i'm close but still it's not working the way I want it to work.
note, in my dataset the variable names are bb1:bb54 for the patient questionnaire and pbb1:pbb54 for the proxy questionnaire
Example of what I've tried
df_long <- df_reshape %>%
gather(testname, value, -(bb1:bb11), -(pbb1:pbb11), -id, -pgebdat, -p_age, na.rm=T) %>%
arrange(id)
One approach would be to use
unite
andseparate
intidyr
, along with thegather
function as well.I'm using your
mydf
data frame since it was provided, but it should be pretty straightforward to make any changes:Which gives you:
Our numbers are different since they were all randomly generated with
sample
.Edited per @alistaire comment: add
convert = TRUE
to theseparate
call to make sure the responses are still of class integer.