Insert NA's in case there are no observations when using subset() and then dcast or tapply

826 views Asked by At

I have the following data frame (this is only the head of the data frame). The ID column is subject (I have more subjects in the data frame, not only subject #99). I want to calculate the mean "rt" by "subject" and "condition" only for observations that have z.score (in absolute values) smaller than 1.

> b
  subject   rt ac condition     z.score
1      99 1253  1     200_9  1.20862682
2      99 1895  1     102_2  2.95813507
3      99 1049  1      68_1  1.16862102
4      99 1732  1      68_9  2.94415384
5      99  765  1      34_9 -0.63991180
7      99 1016  1      68_2 -0.03191493

I know I can to do it using tapply or dcast (from reshape2) after subsetting the data:

b1 <- subset(b, abs(z.score) < 1)

b2 <- dcast(b1, subject~condition, mean, value.var = "rt")

  subject      34_1      34_2      34_9      68_1      68_2     68_9     102_1     102_2    102_9     200_1     200_2    200_9
1      99 1028.5714  957.5385  861.6818  837.0000  969.7222 856.4000  912.5556  977.7273 858.7800 1006.0000 1015.3684 913.2449
2    5203  957.8889  815.2500  845.7750  933.0000  893.0000 883.0435  926.0000  879.2778 813.7308  804.2857  803.8125 843.7200
3    5205 1456.3333 1008.4286  850.7170 1142.4444  910.4706 998.4667  935.2500  980.9167 897.4681 1040.8000  838.7917 819.9710
4    5306 1022.2000  940.5882  904.6562 1525.0000 1216.0000 929.5167  955.8571  981.7500 902.8913  997.6000  924.6818 883.4583
5    5307 1396.1250 1217.1111 1044.4038 1055.5000 1115.6000 980.5833 1003.5714 1482.8571 941.4490 1091.5556 1125.2143 989.4918
6    5308  659.8571  904.2857  966.7755  960.9091 1048.6000 904.5082  836.2000 1753.6667 926.0400  870.2222 1066.6667 930.7500

In the example above for b1 each of the subjects had observations that met the subset demands. However, it can be that for a certain subject I won't have observations after I subset. In this case I want to get NA in b2 for that subject in the specific condition in which he doesn't have observations that meet the subset demands. Does anyone have an idea for a way to do that? Any help will be greatly appreciated.

Best, Ayala

2

There are 2 answers

10
aosmith On BEST ANSWER

There is a drop argument in dcast that you can use in this situation, but you'll need to convert subject to a factor.

Here is a dataset with a second subject ID that has no values that meet your condition that the absolute value of z.score is less than one.

library(reshape2)

bb = data.frame(subject=c(99,99,99,99,99,11,11,11), rt=c(100,150,2,4,10,15,1,2), 
             ac=rep(1,8), condition=c("A","A","B","D","C","C","D","D"),
             z.score=c(0.2,0.3,0.2,0.3,.2,2,2,2))

If you reshape this to a wide format with dcast, you lose subject number 11 even with the drop argument.

dcast(subset(bb, abs(z.score) < 1), subject ~ condition, fun = mean, 
     value.var = "rt", drop = FALSE)

  subject   A B  C D
1      99 125 2 10 4

Make subject a factor.

bb$subject = factor(bb$subject)

Now you can dcast with drop = FALSE to keep all subjects in the wide dataset.

dcast(subset(bb, abs(z.score) < 1), subject ~ condition, fun = mean, 
     value.var = "rt", drop = FALSE) 

  subject   A   B   C   D
1      11 NaN NaN NaN NaN
2      99 125   2  10   4

To get NA instead of NaN you can use the fill argument.

dcast(subset(bb, abs(z.score) < 1), subject ~ condition, fun = mean, 
     value.var = "rt", drop = FALSE, fill = as.numeric(NA)) 

  subject   A  B  C  D
1      11  NA NA NA NA
2      99 125  2 10  4
2
sandro scodelller On

Is it the following you are after? I created a similar dataset "bb"

library("plyr")  ###needed for . function below
bb<- data.frame(subject=c(99,99,99,99,99,11,11,11),rt=c(100,150,2,4,10,15,1,2), ac=rep(1,8) ,condition=c("A","A","B","D","C","C","D","D"),     z.score=c(0.2,0.3,0.2,0.3,1.5,-0.3,0.8,0.7))

bb  
  subject  rt ac condition z.score
#1      99 100  1         A     0.2
#2      99 150  1         A     0.3
#3      99   2  1         B     0.2
#4      99   4  1         D     0.3
#5      99  10  1         C     1.5
#6      11  15  1         C    -0.3
#7      11   1  1         D     0.8
#8      11   2  1         D     0.7

Then you call dcast with subset included:

cc<-dcast(bb,subject~condition, mean, value.var = "rt",subset = .(abs(z.score)<1))  
cc  
   subject   A   B   C   D 
#1      11 NaN NaN  15 1.5
#2      99 125   2 NaN 4.0