Issue
I would like to create new empty columns with dplyr, based on a vector containing the new variables names. The columns I would like to create will contain only "0" for example. It is probably very easy but I can't find a solution. Ideally, i would use dplyr::mutates function (mutate, mutate_at...), because I need a solution that also works with spark dataframes. I would normally use mutate_at() but it only works if the columns already exist.
Note to those saying it's a duplicate
I can't use tibble::add_column(). I unfortunately need a solution that also works with spark dataframes
Reproducible example
library("dplyr")
# --- trying to create 3 new columns
# I would like to do something like this :
new_vars = c("var1", "var2", "var3")
mtcars %>%
mutate_at(.vars = new_vars, .funs = ~ 0)
but it generates the error :
Error in `tbl_at_vars()`:
! Can't subset columns that don't exist.
✖ Column `var1` doesn't exist.
of course, it works fine if the columns already exist
mtcars %>%
mutate_at(.vars = c('mpg', 'cyl'), .funs = ~ 0)
obvious data.frame solution but I need a solution that also works with spark dataframes
new_vars = c("var1", "var2", "var3")
mtcars[, new_vars] = 0
obvious data.table solution but I need a solution that also works with spark dataframes
library("data.table")
new_vars = c("var1", "var2", "var3")
mtcars_dt = as.data.table(mtcars)
mtcars_dt[, (new_vars) := 0]
Thank you.
I don't have
sparkavailable, but does anarrowconnection closely-enough approximate what is needed?