partikit predict() returns less rows than input data with missing predictor values

262 views Asked by At

I'm having a problem with partikit weighted conditional tree models trained on data with missing values.

I'm manually creating a bagged tree model by giving different integer weights to observations at each cycle.

But when I used the bootstrapped models to make predictions, I noticed that some of them were returning less values than the input data rows. Interestingly, out of 299 rows in the input data, the predicted data length was either 299 or 289. 289 is the number of rows after removing predictors with missing data.

Digging down the problem I found that it arises from the interaction of three components:

  • Using weights in the model;
  • Having missing data in the predictors;
  • Using character variables instead of factors in the input data passed to predict()

If only one of these three conditions is missing the problem doesn't arise and all trees return 299 values.

Here is the data: https://www.dropbox.com/s/98oriv2msce4wu5/anonym_data.rds?dl=0 Here is a script to reproduce the problem: https://www.dropbox.com/s/5y7g2dwt2838pbp/test.R?dl=0

1

There are 1 answers

2
malavv On

The links no longer work, but I think you meant partykit. Even though ctree models can deal with missing data, there seem to be difficulties with the use of predict.party. The code uses a call to model.frame with the default na.action to na.fail.

I'm not good enough to say whether that's a bug, but it seems strange to me, and will likely fix the issue you are seeing. You can download the partykit source code, modify this line, adding the option na.action = na.pass.

Although I hope you are not still having this issue 1y 5m in the future.