I have genetic data for SNPs that has been divided into 5 quantiles. I want to find the median of these quantiles for each SNP (i.e. each person).
I used this command to create a column for median values:
data$median<-apply(data[,2:181],1, median, na.rm=TRUE)
Then I wanted to count how many cases and controls I have for each of my phenotypes, but it looks like it's calculating the median incorrectly. My command is as follows:
table(data$anyMI, data$median)
The output is showing:
1 1.5 2 2.5 3 3.5 4 4.5 5
0 2044 62 7470 221 11163 248 8389 74 1659
1 102 3 357 11 557 21 404 2 85
I'm not sure why I'm getting half values, when it should only be 1-5, whole numbers. What is going wrong here and why is it showing half-values?
By defintion a median is a value such as half of your sample is higher, and the other half lower. As phiver said, if you have an even number of values, let's say that the higher boundary of your first half will be
x
and the lower of the second half will bey
, any value betweenx
andy
can be the median.By default, R will state that
median = (x+y)/2
in that case.If you want to have a value from your dataset, you can use an odd number of observation (remove one for instance), or round the result.