I am running an linear model regression analysis script and I am running emmeans
(ls means) on my model but I am getting a whole of NA's not sure why... Here is what I have run:
setwd("C:/Users/wkmus/Desktop/R-Stuff")
### yeild-twt
ASM_Data<-read.csv("ASM_FIELD_18_SUMM_wm.csv",header=TRUE, na.strings=".")
head(ASM_Data)
str(ASM_Data)
####"NA" values in table are labeled as "." colored orange
ASM_Data$REP <- as.factor(ASM_Data$REP)
head(ASM_Data$REP)
ASM_Data$ENTRY_NO <-as.factor(ASM_Data$ENTRY_NO)
head(ASM_Data$ENTRY_NO)
ASM_Data$RANGE<-as.factor(ASM_Data$RANGE)
head(ASM_Data$RANGE)
ASM_Data$PLOT_ID<-as.factor(ASM_Data$PLOT_ID)
head(ASM_Data$PLOT_ID)
ASM_Data$PLOT<-as.factor(ASM_Data$PLOT)
head(ASM_Data$PLOT)
ASM_Data$ROW<-as.factor(ASM_Data$ROW)
head(ASM_Data$ROW)
ASM_Data$REP <- as.numeric(as.character(ASM_Data$REP))
head(ASM_Data$REP)
ASM_Data$TWT_g.li <- as.numeric(as.character(ASM_Data$TWT_g.li))
ASM_Data$Yield_kg.ha <- as.numeric(as.character(ASM_Data$Yield_kg.ha))
ASM_Data$PhysMat_Julian <- as.numeric(as.character(ASM_Data$PhysMat_Julian))
ASM_Data$flowering <- as.numeric(as.character(ASM_Data$flowering))
ASM_Data$height <- as.numeric(as.character(ASM_Data$height))
ASM_Data$CLEAN.WT <- as.numeric(as.character(ASM_Data$CLEAN.WT))
ASM_Data$GRAV.TEST.WEIGHT <-as.numeric(as.character(ASM_Data$GRAV.TEST.WEIGHT))
str(ASM_Data)
library(lme4)
#library(lsmeans)
library(emmeans)
Here is the data frame:
> str(ASM_Data)
'data.frame': 270 obs. of 20 variables:
$ TRIAL_ID : Factor w/ 1 level "18ASM_OvOv": 1 1 1 1 1 1 1 1 1 1 ...
$ PLOT_ID : Factor w/ 270 levels "18ASM_OvOv_002",..: 1 2 3 4 5 6 7 8 9 10 ...
$ PLOT : Factor w/ 270 levels "2","3","4","5",..: 1 2 3 4 5 6 7 8 9 10 ...
$ ROW : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ RANGE : Factor w/ 15 levels "1","2","3","4",..: 2 3 4 5 6 7 8 9 10 12 ...
$ REP : num 1 1 1 1 1 1 1 1 1 1 ...
$ MP : int 1 1 1 1 1 1 1 1 1 1 ...
$ SUB.PLOT : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 2 2 2 2 2 3 ...
$ ENTRY_NO : Factor w/ 139 levels "840","850","851",..: 116 82 87 134 77 120 34 62 48 136 ...
$ height : num 74 70 73 80 70 73 75 68 65 68 ...
$ flowering : num 133 133 134 134 133 131 133 137 134 132 ...
$ CLEAN.WT : num 1072 929 952 1149 1014 ...
$ GRAV.TEST.WEIGHT : num 349 309 332 340 325 ...
$ TWT_g.li : num 699 618 663 681 650 684 673 641 585 646 ...
$ Yield_kg.ha : num 2073 1797 1841 2222 1961 ...
$ Chaff.Color : Factor w/ 3 levels "Bronze","Mixed",..: 1 3 3 1 1 1 1 3 1 3 ...
$ CHAFF_COLOR_SCALE: int 2 1 1 2 2 2 2 1 2 1 ...
$ PhysMat : Factor w/ 3 levels "6/12/2018","6/13/2018",..: 1 1 1 1 1 1 1 1 1 1 ...
$ PhysMat_Julian : num 163 163 163 163 163 163 163 163 163 163 ...
$ PEDIGREE : Factor w/ 1 level "OVERLEY/OVERLAND": 1 1 1 1 1 1 1 1 1 1 ...
This is the head of ASM Data:
head(ASM_Data)
`TRIAL_ID PLOT_ID PLOT ROW RANGE REP MP SUB.PLOT ENTRY_NO height flowering CLEAN.WT GRAV.TEST.WEIGHT TWT_g.li`
1 18ASM_OvOv 18ASM_OvOv_002 2 1 2 1 1 A 965 74 133 1071.5 349.37 699
2 18ASM_OvOv 18ASM_OvOv_003 3 1 3 1 1 A 931 70 133 928.8 309.13 618
3 18ASM_OvOv 18ASM_OvOv_004 4 1 4 1 1 A 936 73 134 951.8 331.70 663
4 18ASM_OvOv 18ASM_OvOv_005 5 1 5 1 1 A 983 80 134 1148.6 340.47 681
5 18ASM_OvOv 18ASM_OvOv_006 6 1 6 1 1 B 926 70 133 1014.0 324.95 650
6 18ASM_OvOv 18ASM_OvOv_007 7 1 7 1 1 B 969 73 131 1076.6 342.09 684
Yield_kg.ha Chaff.Color CHAFF_COLOR_SCALE PhysMat PhysMat_Julian PEDIGREE
1 2073 Bronze 2 6/12/2018 163 OVERLEY/OVERLAND
2 1797 White 1 6/12/2018 163 OVERLEY/OVERLAND
3 1841 White 1 6/12/2018 163 OVERLEY/OVERLAND
4 2222 Bronze 2 6/12/2018 163 OVERLEY/OVERLAND
5 1961 Bronze 2 6/12/2018 163 OVERLEY/OVERLAND
6 2082 Bronze 2 6/12/2018 163 OVERLEY/OVERLAND
I am looking at a linear model dealing with test weight.
This is what I ran:
ASM_Data$TWT_g.li <- as.numeric(as.character((ASM_Data$TWT_g.li))) head(ASM_Data$TWT_g.li)
ASM_YIELD_1 <- lm(TWT_g.li~ENTRY_NO + REP + SUB.BLOCK, data=ASM_Data)
anova(ASM_YIELD_1)
summary(ASM_YIELD_1)
emmeans(ASM_YIELD_1, "ENTRY_NO") ###########ADJ. MEANS
I get an output for anova
anova(ASM_YIELD_1)
Analysis of Variance Table
Response: TWT_g.li
Df Sum Sq Mean Sq F value Pr(>F)
ENTRY_NO 138 217949 1579 7.0339 < 2e-16 ***
REP 1 66410 66410 295.7683 < 2e-16 ***
SUB.BLOCK 4 1917 479 2.1348 0.08035 .
Residuals 125 28067 225
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
but for emmeans I get something like this:
ENTRY_NO emmean SE df asymp.LCL asymp.UCL
840 nonEst NA NA NA NA
850 nonEst NA NA NA NA
851 nonEst NA NA NA NA
852 nonEst NA NA NA NA
853 nonEst NA NA NA NA
854 nonEst NA NA NA NA
855 nonEst NA NA NA NA
857 nonEst NA NA NA NA
858 nonEst NA NA NA NA
859 nonEst NA NA NA NA
I do have outliers in my data which is indicated by a "." in my data but that's the only thing I can think of which is off.
When I run with(ASM_Data, table(ENTRY_NO, REP, SUB.BLOCK))
this is what I have:
with(ASM_Data, table(ENTRY_NO,REP,SUB.BLOCK))
, , SUB.BLOCK = A
REP
ENTRY_NO 1 2
840 0 0
850 0 0
851 0 0
852 0 0
853 0 0
854 0 0
855 0 0
857 0 0
858 0 0
859 0 0
860 0 0
861 0 0
862 0 0
863 1 0
864 0 0
865 1 0
866 1 0
867 0 0
868 0 0
869 1 0
870 1 0
871 0 0
872 0 0
873 0 0
874 0 0
875 0 0
876 0 0
877 0 0
878 0 0
879 1 0
880 0 0
881 0 0
882 0 0
883 0 0
884 0 0
885 1 0
886 0 0
887 1 0
888 1 0
889 1 0
890 0 0
891 1 0
892 0 0
893 0 0
894 0 0
895 0 0
896 1 0
897 0 0
898 0 0
899 0 0
900 1 0
901 1 0
902 0 0
903 0 0
904 1 0
905 1 0
906 0 0
907 1 0
908 1 0
909 0 0
910 0 0
911 0 0
912 0 0
913 0 0
914 0 0
915 0 0
916 1 0
917 0 0
918 0 0
919 1 0
920 0 0
921 0 0
922 0 0
923 1 0
924 0 0
925 0 0
926 0 0
927 1 0
928 0 0
929 0 0
930 0 0
931 1 0
932 0 0
933 0 0
934 0 0
935 0 0
936 1 0
937 0 0
938 1 0
939 1 0
940 0 0
941 1 0
942 0 0
943 1 0
944 0 0
945 0 0
946 0 0
947 0 0
948 1 0
949 0 0
950 1 0
951 0 0
952 0 0
953 0 0
954 0 0
955 1 0
956 1 0
957 1 0
958 1 0
959 0 0
960 0 0
961 0 0
962 0 0
963 0 0
964 0 0
965 1 0
966 0 0
967 1 0
968 0 0
969 0 0
970 1 0
971 0 0
972 0 0
973 0 0
974 1 0
975 0 0
976 0 0
977 0 0
978 1 0
979 0 0
980 0 0
981 0 0
982 0 0
983 1 0
984 1 0
985 0 0
986 1 0
987 3 0
988 0 0
, , SUB.BLOCK = B
REP
ENTRY_NO 1 2
840 0 0
850 0 0
851 0 0
852 0 0
853 1 0
854 0 0
855 0 0
857 0 0
858 0 0
859 0 0
860 0 0
861 1 0
862 0 0
863 0 0
864 0 0
865 0 0
866 0 0
867 0 0
868 0 0
869 0 0
870 0 0
871 1 0
872 0 0
873 0 0
874 0 0
875 0 0
876 1 0
877 1 0
878 1 0
879 0 0
880 1 0
881 0 0
882 1 0
883 1 0
884 1 0
885 0 0
886 0 0
887 0 0
888 0 0
889 0 0
890 1 0
891 0 0
892 1 0
893 1 0
894 1 0
895 1 0
896 0 0
897 1 0
898 0 0
899 0 0
900 0 0
901 0 0
902 1 0
903 0 0
904 0 0
905 0 0
906 0 0
907 0 0
908 0 0
909 1 0
910 0 0
911 1 0
912 0 0
913 1 0
914 0 0
915 0 0
916 0 0
917 0 0
918 0 0
919 0 0
920 1 0
921 1 0
922 0 0
923 0 0
924 0 0
925 1 0
926 1 0
927 0 0
928 0 0
929 0 0
930 1 0
931 0 0
932 1 0
933 0 0
934 1 0
935 0 0
936 0 0
937 1 0
938 0 0
939 0 0
940 1 0
941 0 0
942 0 0
943 0 0
944 0 0
945 1 0
946 0 0
947 1 0
948 0 0
949 0 0
950 0 0
951 1 0
952 0 0
953 0 0
954 1 0
955 0 0
956 0 0
957 0 0
958 0 0
959 1 0
960 0 0
961 0 0
962 1 0
963 0 0
964 0 0
965 0 0
966 0 0
967 0 0
968 0 0
969 1 0
970 0 0
971 0 0
972 0 0
973 0 0
974 0 0
975 0 0
976 1 0
977 1 0
978 0 0
979 0 0
980 0 0
981 1 0
982 1 0
983 0 0
984 0 0
985 3 0
986 0 0
987 1 0
988 1 0
, , SUB.BLOCK = C
REP
ENTRY_NO 1 2
840 0 0
850 0 0
851 0 0
852 0 0
853 0 0
854 0 0
855 0 0
857 1 0
858 0 0
859 1 0
860 0 0
861 0 0
862 1 0
863 0 0
864 0 0
865 0 0
866 0 0
867 0 0
868 0 0
869 0 0
870 0 0
871 0 0
872 1 0
873 0 0
874 0 0
875 0 0
876 0 0
877 0 0
878 0 0
879 0 0
880 0 0
881 1 0
882 0 0
883 0 0
884 0 0
885 0 0
886 1 0
887 0 0
888 0 0
889 0 0
890 0 0
891 0 0
892 0 0
893 0 0
894 0 0
895 0 0
896 0 0
897 0 0
898 1 0
899 1 0
900 0 0
901 0 0
902 0 0
903 1 0
904 0 0
905 0 0
906 1 0
907 0 0
908 0 0
909 0 0
910 1 0
911 0 0
912 1 0
913 0 0
914 1 0
915 1 0
916 0 0
917 1 0
918 1 0
919 0 0
920 0 0
921 0 0
922 1 0
923 0 0
924 1 0
925 0 0
926 0 0
927 0 0
928 1 0
929 1 0
930 0 0
931 0 0
932 0 0
933 1 0
934 0 0
935 1 0
936 0 0
937 0 0
938 0 0
939 0 0
940 0 0
941 0 0
942 1 0
943 0 0
944 1 0
945 0 0
946 1 0
947 0 0
948 0 0
949 1 0
950 0 0
951 0 0
952 1 0
953 1 0
954 0 0
955 0 0
956 0 0
957 0 0
958 0 0
959 0 0
960 1 0
961 1 0
962 0 0
963 1 0
964 1 0
965 0 0
966 1 0
967 0 0
968 1 0
969 0 0
970 0 0
971 1 0
972 1 0
973 1 0
974 0 0
975 1 0
976 0 0
977 0 0
978 1 0
979 2 0
980 0 0
981 0 0
982 0 0
983 0 0
984 0 0
985 1 0
986 3 0
987 0 0
988 0 0
, , SUB.BLOCK = D
REP
ENTRY_NO 1 2
840 0 0
850 0 0
851 0 0
852 0 1
853 0 0
854 0 0
855 0 0
857 0 0
858 0 1
859 0 0
860 0 1
861 0 0
862 0 0
863 0 0
864 0 1
865 0 0
866 0 0
867 0 0
868 0 0
869 0 0
870 0 0
871 0 0
872 0 0
873 0 0
874 0 0
875 0 1
876 0 0
877 0 0
878 0 1
879 0 0
880 0 1
881 0 1
882 0 1
883 0 1
884 0 1
885 0 0
886 0 0
887 0 0
888 0 0
889 0 0
890 0 0
891 0 0
892 0 1
893 0 0
894 0 0
895 0 0
896 0 0
897 0 1
898 0 0
899 0 1
900 0 0
901 0 0
902 0 1
903 0 0
904 0 0
905 0 0
906 0 0
907 0 0
908 0 0
909 0 0
910 0 0
911 0 0
912 0 0
913 0 1
914 0 1
915 0 1
916 0 0
917 0 1
918 0 1
919 0 0
920 0 0
921 0 1
922 0 1
923 0 0
924 0 0
925 0 0
926 0 0
927 0 0
928 0 0
929 0 1
930 0 1
931 0 0
932 0 0
Can someone please give me an idea of what is going wrong??
Thanks !
I've been able to create a situation like this. Consider this dataset:
This has 6 complete blocks, separately labeled with 3 blocks per rep. Not obvious, but true, is that
rep
is a numeric variable having values1
and2
, whileblk
is a factor having 6 levels1
--6
:With this complete dataset, I have no problem obtaining EMMs for modeling situations parallel to what was used in the original posting. However, if I use only a subset of these data, it is different. Consider:
Again, recall that
rep
is numeric in this model. If instead, I makerep
a factor:We get non-
NA
estimates, in part because it is able to detect the fact thatblk
is nested inrep
, and thus performs the EMM computations separately in each rep. Note in the annotations in this last output that averaging is done over the 2 reps and 6 blocks; whereas infiber.lm
averaging is done only over blocks, whilerep
, a numeric variable, is set at its average. Compare the reference grids for the two models:An additional option is to avoid the nesting issue by simply omitting
rep
from the model:Note that exactly the same results are produced. The reason is the levels of
blk
already predict the levels ofrep
, so there is no need for it to be in the model.In summary:
rep
was in the model as a numeric predictor rather than a factor.factor(REP)
instead ofREP
as a numeric predictor. This may be enough to produce estimates.SUB.BLOCK
levels predict theREP
levels, just leaveREP
out of the model altogether.