I have split my data set into testing and training data sets. I've tried to fit a regression on the training set, and then use predict on the testing set. When I do this I get an error message that says: "Error in model.frame factor x has New Levels". I know this is because there are levels in my testing data not seen in my training data.
What I want to do is just eliminate or ignore the levels that aren't in both data sets. I've tried to do this, but it isn't setting any levels to NA, and the id object says "integer (empty)":
id <- which(!(test$x %in% levels (train$x))
train$x[id] <- NA
fit <- lm(y ~ x, data=train)
P <- predict(fit,test)
You will get "replacement length differs" error with your code.
tells you what elements in
test$xare not inlevels(train$x), so you should useidto indextest$x, nottrain$x, when doing replacement.All data in
trainwill be used to build your linear regression model. Some predictions inPwill beNA.Then, what is the point of your question??!! All levels in
test$xare insidelevels(train$x)and there is no new level.