I have split my data set into testing and training data sets. I've tried to fit a regression on the training set, and then use predict on the testing set. When I do this I get an error message that says: "Error in model.frame factor x has New Levels". I know this is because there are levels in my testing data not seen in my training data.
What I want to do is just eliminate or ignore the levels that aren't in both data sets. I've tried to do this, but it isn't setting any levels to NA
, and the id
object says "integer (empty)":
id <- which(!(test$x %in% levels (train$x))
train$x[id] <- NA
fit <- lm(y ~ x, data=train)
P <- predict(fit,test)
You will get "replacement length differs" error with your code.
tells you what elements in
test$x
are not inlevels(train$x)
, so you should useid
to indextest$x
, nottrain$x
, when doing replacement.All data in
train
will be used to build your linear regression model. Some predictions inP
will beNA
.Then, what is the point of your question??!! All levels in
test$x
are insidelevels(train$x)
and there is no new level.