For example, I have data like below.
genotype=rep(c("A","B","C","D","E"), each=5)
env=rep(c(10,20,30,40,50), time=5)
outcome=c(10,15,17,19,22,12,13,15,18,25,10,11,12,13,18,11,15,20,22,28,10,9,10,12,15)
dataA=data.frame(genotype,env,outcome)
Then, I would like to fit between outcome and env per genotype, and also want to calculate RMSE. So, I used this code.
A=anova(lm(outcome~env,data=subset(dataA, genotype=="A")))
##
Response: outcome
Df Sum Sq Mean Sq F value Pr(>F)
env 1 78.4 78.400 84 0.002746 **
Residuals 3 2.8 0.933
##
A_rmse=sqrt(0.933)
A_rmse= 0.9659193
I need to calculate B and until E genotype in the same way, but in my actual data, genotype is more than 100, so it would be impossible to calculate one by one. So I'd like to know how to automatically calculate RMSE (=square root of MSE in anova table) per each genotype.
Could you let me know how to do it?
Always many thanks,
One approach is to use
nestthe data by genotype. This allows you to create a data frame that contains the results ofanovain a list column. You can then usebroom::tidyto extract theMean sqvalues and calculate the RMSE.The basis for this is the excellent tutorial Running a model on separate groups.
First, install the packages if required, then load:
Here's what nesting the data by genotype looks like:
Now we can use
mutateandmapto add columns with theanovaresults, and the metrics extracted bytidy:Now we
unnesttheresultscolumn andfilterfor the residuals term:And finally, calculate RMSE using
sqrt(meansq), then select the desired columns.So the entire process looks like this: