modelr add_predictions error: in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels)

664 views Asked by At

I am facing the following error using modelr add_predictions function.

modelr add_predictions error: in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): fe.lead.surgeon has new levels ....

In my understanding, it is a common issue that arises when you are making the prediction model using a train dataset and applying the model to a test dataset since the factor levels that existed in a train dataset may not be present in a test dataset. However, I am using the same sample for creating the model and getting the predicted values, and still getting this error.

Specifically, here is the code I am using, and I would appreciate it for any insight on why this error occurs and how to solve this issue.

# indep is a vector of independent variable names
# dep is a vector of dependent variable names
# id.case is the id variable
# sample is my dataset.

  eq <- 
            paste(indep, collapse = ' + ') %>%
            paste(dep, ., sep = ' ~ ') %>%
            as.formula  
          
          s <-
            lm(eq, data = sample %>% select(-id.case))
          
          pred <- 
            sample %>% 
            modelr::add_predictions(s) %>% 
            select(id.case, pred) 

As per the request of @SimoneBianchi, I am providing the reproducible example here.

Reproducible example

  library(tidyverse)
  library(tibble)
  library(data.table)
  
  rename <- dplyr::rename
  select <- dplyr::select
  
  set.seed(10002)
  id <- sample(1:1000, 1000, replace=F)
  
  set.seed(10003)
  fe1 <- sample(c('A','B','C'), 1000, replace=T)
  
  set.seed(10001)
  fe2 <- sample(c('a','b','c'), 1000, replace=T)
  
  set.seed(10001)
  cont1 <- sample(1:300, 1000, replace=T)
  
  set.seed(10004)
  value <- sample(1:30, 1000, replace=T)
  
  sample <-   
    data.frame(id, fe1, fe2, cont1, value) 

  dep <- 'value'
  
  indep <- 
    c('fe1','fe2', 'cont1')
  
  
  eq <- 
    paste(indep, collapse = ' + ') %>%
    paste(dep, ., sep = ' ~ ') %>%
    as.formula  
  
  s <-
    lm(eq, data = sample %>% select(-id))
  
  pred <- 
    sample %>% 
    modelr::add_predictions(s) %>% 
    select(id, pred)

Update and Workaround

One workaround I found is that you don't use modelr function but use fitted function. However, I would still want to learn why the regression automatically drops soma factor levels from a factor variable. If anyone knows, please leave a comment.

   pred <- 
    sample %>% 
    cbind(pred = fitted(s))

Closing: Problem found with the dataset

I found that some observations were NA that had new levels in the corresponding factor variable -- the error. After I fixed the NA, the original code worked fine. So, it was a problem with the dataset rather than the code!

Thank you all for trying to help me out.

0

There are 0 answers