Error in "missforest" in R

2.9k views Asked by At

Need help to get around the below error while performing data imputation in R using "missforest" package.

> imputed<- missForest(dummy, maxiter = 10, ntree = 100, variablewise = TRUE,
+                      decreasing = TRUE, verbose = TRUE,
+                      mtry = floor(sqrt(ncol(dummy))), replace = TRUE)
Error in sample.int(length(x), size, replace, prob) : 
  invalid first argument
3

There are 3 answers

0
Fanfoué On

As pointed out by others, missForest() requires input data to be of class data.frame or matrix. If, like many people, you imported or manipulated your data using functions of the tidyverse packages, then your dataset is likely to be a tibble (class tbl_df) and will thus need to be converted with as.data.frame() before imputing the missing values.
As OP said that his/her data were contained in a data.frame, the problem perhaps comes from the class of the variables. According to this page, the same error message can appear if you have date variables (class date or difftime). Be sure to work with numeric or factor variables only.

0
asifkhan On

if you are using fread() to read the data, try using read.csv() instead. I had the same problem while using fread() to read the data, even after converting the data.table to data.frame by using as.data.frame() later. But, later I read the data by using read.csv and the problem got solved.

4
mrbubu On

Had the same problem. Transforming xmis object with as.data.frame helped. In your case it would be something like:

dummy <- as.data.frame(dummy)    
imputed<- missForest(dummy, maxiter = 10, ntree = 100, variablewise = TRUE,
                      decreasing = TRUE, verbose = TRUE,
                      mtry = floor(sqrt(ncol(dummy))), replace = TRUE)