mlr3 - Editing `task$data()`

153 views Asked by At

Is there a way to edit task$data() or replace it with a new data.frame() with exactly the same colnames?

I've tried the following task_train$data() <- newDF and task_train$data <- newDF. They both result in Error in task_train$data() <- di : invalid (NULL) left side of assignment and Error in task_train$data <- newDF: cannot change value of locked binding for 'data', respectively.

2

There are 2 answers

0
missuse On BEST ANSWER

Once you create the task all further data transformations, augmentations etc. should be performed using pipelines. This is especially handy when performing resampling/tuning since it avoids data leakage.

Based on the comment by @pat-s this is not only my opinion but the opinion of the core mlr team. And this is the reason why direct editing of the task data (in ways you show in the question) fails.

0
mikoontz On

One use case is for swapping knockoff data in place of the real data for measuring conditional effects of features as in the {cpi} package. This would allow other key parts of the task to remain (e.g., weights, coordinates) and only modify the data itself.

mlr_pipeops_mutate gets us what we want: help file here

library("mlr3")
library("mlr3pipelines")

constant = 1
pom = mlr3pipelines::po("mutate")
pom$param_set$values$mutation = list(
  Sepal.Length_plus_constant = ~ Sepal.Length + constant,
  Sepal.Area = ~ Sepal.Width * Sepal.Length,
  Petal.Area = ~ Petal.Width * Petal.Length,
  Sepal.Area_plus_Petal.Area = ~ Sepal.Area + Petal.Area,
  Sepal.Width = Sepal.Width / 2.54 # modify column in place
)