Logistic regression with NAs and factors returns error

949 views Asked by At

I met the following two major problems when running logistic regression:

My X variables includes factor variables, such as immigrant status (immigrant, non-immigrant); my Y variable is a binomial variable, low birth weight (non-lbw, lbw).

I run the following R script (I am using plsRglm package):

library(plsRglm)
model.plsrglm <- plsRglm(yair, xair, 3, modele="pls-glm-logistic")

1) If I do not drop all the NA values in y or x, R returns this:

summary(model.plsrglm)
Call
plsRglmmodel.default(dataY = yair, dataX = xair, nt = 6, 
modele = "pls-glm-logistic")

> model.plsrglm
Number of required components:
NULL
Number of successfully computed components:
NULL
Coefficients:
NULL
Information criteria and Fit statistics:
NULL

2) If I do drop all the NA values before running the model, R gives an error:

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

So should I drop all NA value before generating the model?

And should I make the factor variable into numeric? If so, how should I do that, just by using as.numeric? But wouldn't it imply a level between non-immigrant and immigrant?

And for the Y variable, should I recode it as 0 and 1?

I added a reproducible dataset as below.

   outcome  c1  c2    c3   c4
1      lbw 120 yes   <30 good
2      lbw 124 yes   <30 good
3      lbw 125 yes   <30 good
4      lbw 135 yes   <30 good
5      lbw 112 yes   <30 good
6      lbw 168 yes   <30 good
7      lbw 147 yes 30-40 good
8      lbw 174 yes 30-40 fair
9      lbw 153 yes 30-40 fair
10     lbw 145 yes 30-40 fair
11     lbw 145 yes 30-40 fair
12     lbw 125  no   >40 fair
13     lbw 125  no   >40 poor
14     lbw 111  no   >40 poor
15 non-lbw  80  no   >40 poor
16 non-lbw  85  no   >40 poor
17 non-lbw  78 yes   >40 poor
18 non-lbw  67  no   >40 poor


xair <- bc1997[,c("c1","c2","c3","c4")]
yair <- bc1997[,"outcome"]

model.plsrglm <- plsRglm(yair, xair, 2, modele="pls-glm-logistic")
summary(model.plsrglm)

But I got this error:

> model.plsrglm <- plsRglm(yair, xair, 2, modele="pls-glm-logistic")
____************************************************____

Family: binomial 
Link function: logit 

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
1

There are 1 answers

0
salauer On

Your 'x' terms must be numeric. Your variables "c2", "c3", and "c4" are all class logistic or factor.

The default setting for scaleX is TRUE, it is using colMeans() in order to scale your predictors. However, this is not possible with factors. Therefore, you can either convert each column to numeric or specify scaleX=FALSE.