I have a dataframe with four columns: BMI, AGE, SEX and Geno. Geno is a factor variable has 4 levels(A|A,A|G,G|A,ref:G|G). I would like to fit a linear regression model use BMI~AGE + SEX + Geno, and would do a TukeyHSD test for a pairwise comparison between different levels of Geno. The lm model shows Geno is significant, but the TukeyHSD test keeps returning NA. Did I do something wrong or TukeyHSD is not suitable for linear model with both numeric and factor variables? I'm a new to the post-hoc test. If Tukey test is not a good option, any other method I can use? Thanks.
the data frame looks like:
RSID Indiv_ID Indiv_Geno SEX AGE BMI
1 rs1121980 23102 A|A 1 90 34.59993
2 rs1121980 158 A|A 1 58 26.39891
3 rs1121980 9790 A|A 2 90 29.05149
4 rs1121980 22320 G|G 2 73 23.62653
5 rs1121980 10441 G|A 1 59 32.26947
6 rs1121980 10376 G|G 2 90 26.81053
My code is as below:
fto_geno_pheno = na.omit(fto_geno_pheno)
snp_1 = fto_geno_pheno %>%
filter(RSID == "rs1121980")
sapply(snp_1, class)
snp_1$SEX = factor(snp_1$SEX)
snp_1$Indiv_Geno = factor(snp_1$Indiv_Geno)
snp_1$Indiv_Geno = relevel(snp_1$Indiv_Geno, ref = "G|G")
lm = lm(snp_1$BMI ~ snp_1$SEX + snp_1$AGE + snp_1$Indiv_Geno, snp_1)
av = aov(lm)
summary(av)
summary(lm)
TukeyHSD(x = av, which = "Indiv_Geno", conf.level = 0.95)
lm model looks like below:
Df Sum Sq Mean Sq F value Pr(>F)
snp_1$SEX 1 3412 3412 108.399 < 2e-16 ***
snp_1$AGE 1 845 845 26.851 2.25e-07 ***
snp_1$Indiv_Geno 3 770 257 8.157 2.02e-05 ***
Residuals 8424 265171 31
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
lm(formula = snp_1$BMI ~ snp_1$SEX + snp_1$AGE + snp_1$Indiv_Geno,
data = snp_1)
Residuals:
Min 1Q Median 3Q Max
-13.657 -3.885 -0.833 2.892 34.548
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.847764 0.285073 104.702 < 2e-16 ***
snp_1$SEX2 -1.248649 0.122730 -10.174 < 2e-16 ***
snp_1$AGE -0.020984 0.004077 -5.147 2.71e-07 ***
snp_1$Indiv_GenoA|A 0.851648 0.178004 4.784 1.74e-06 ***
snp_1$Indiv_GenoA|G 0.145647 0.162781 0.895 0.3710
snp_1$Indiv_GenoG|A 0.329719 0.165168 1.996 0.0459 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.611 on 8424 degrees of freedom
Multiple R-squared: 0.01861, Adjusted R-squared: 0.01802
F-statistic: 31.94 on 5 and 8424 DF, p-value: < 2.2e-16
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = lm)
$<NA>
diff lwr upr p adj
Warning messages:
1: In replications(paste("~", xx), data = mf) :
non-factors ignored: snp_1$AGE
2: In qtukey(conf.level, length(means), x$df.residual) : NaNs produced```
[1]: https://i.stack.imgur.com/PSdQF.png