mlr3 - how to remove incomplete observations using `mlr3` interface

292 views Asked by At

Is it possible to remove incomplete observation within a task --- task <- TaskRegr$new("data", data, "y") --- using mlr3 filters or pipeops?

1

There are 1 answers

0
damir On BEST ANSWER

I don't think there is a preprocessing operator for removing observations.

What I would do is to use filter method within a Task.

Example:

t = tsk("pima")
ids = complete.cases(t$data())

# number of incomplete observations
sum(!ids)

t$filter(which(ids))

# number of incomplete observations
# should be zero now
ids = complete.cases(t$data())
sum(!ids)

complete.cases gives a Boolean vector that indicates which rows contain complete observations (no NA's). filter subsets task's data by row ids provided in the parameter. Row ids not given in the parameter are removed in-place.

If you want to instead impute incomplete observations, there are a few imputation operators like PipeOpImputeConstant that impute features by a constant.