I am trying to conduct several Mann Whitney U's which compare the impacts of population on offspring sex ratio skews. I'm using R studio. The dataset looks like:
data <- data.frame(
DamID = 1:50,
FemaleOffspring = sample(1:10, 50, replace = TRUE),
MaleOffspring = sample(1:10, 50, replace = TRUE),
SexRatio = runif(50, min = 0, max = 1),
BirthPop = sample(c('A', 'B'), 50, replace = TRUE),
Species = sample(c('R','X', 'Y', 'Z'), 50, replace = TRUE)
)
I've written the following line of code
library(dplyr)
sumstats <- data %>%
group_by(Species, BirthPop) %>%
summarize(median=median(SexRatio),
IQR=IQR(SexRatio),
Min=min(SexRatio),
Max=max(SexRatio),
n=n(),
wilcox_p = wilcox.test(SexRatio ~ factor(BirthPop), data = ., alternative = "two.sided")$p.value
Which gives me one p value for the entire dataset when I need a different p value for each species. Not sure what to do about this. Thanks in advance!
Two problems:
Use
cur_data()
. When you use.
, the call towilcox.test()
see all of the data, and it does not honor the grouping thatgroup_by
has imposed.When you group by
BirthPop
, then each call towilcox.test
gets only"A"
or only"B"
, but it needs to see both to be able to perform the test.I suggest do two levels of stats, first on both
Species
andBirthPop
(to get the majority of your statistics), and then once on justSpecies
for your Wilcox tests.We can easily bring these back together with a merge/join operation: