Can we avoid repeating the column name each time when using it in mutates

89 views Asked by At

I use several mutate on the same column. How can we only use mutate once and without repeating the column name?

df <- data.frame(
  c1 = c("Élève", "Café", "Château", "Noël", "Crème")
)

df2 <- df %>% 
  mutate(c1 = trimws(c1)) %>%
  mutate(c1 = gsub("\\s+", " ", c1)) %>%
  mutate(c1 = gsub("\"", "", c1)) %>%
  mutate(c1 = iconv(toupper(c1), to = "ASCII//TRANSLIT"))
  
3

There are 3 answers

3
G. Grothendieck On BEST ANSWER

Place the pipeline within the mutate like this:

df3 <- df %>%
  mutate(c1 = c1 %>%
    trimws %>%
    gsub("\\s+", " ", .) %>%
    gsub("\"", "", .) %>%
    toupper %>%
    iconv(to = "ASCII//TRANSLIT"))

identical(df2, df3)
## [1] TRUE
0
Mark On

You can use pipes within mutate calls! Also, even if that weren't the case, columns you create in a mutate function call can be used later within the same function call. So you could keep on redefining c1 within one mutate call.

But anyway, this is how I would do it (using almost all stringr functions):

library(stringr)

df2 <- df |>
  mutate(c1 = str_squish(c1) |>
              str_remove_all("\"") |>
              str_to_upper() |>
              iconv(to = "ASCII//TRANSLIT"))
0
Andy Baxter On

Not that you need another solution, but it could be handy to combine all your steps into a single function to tidy up your mutate call. You can combine a string of functions easily with purrr::compose to run them in the given order each time you need them.

Using G. Grothendieck's excellent code split into anonymous functions:

library(tidyverse)

df <- data.frame(
  c1 = c("Élève", "Café", "Château", "Noël", "Crème")
)

tidy_text <- compose(
  \(t) gsub("\\s+", " ", t),
  \(t) gsub("\"", "", t),
  toupper,
  \(t) iconv(t, to = "ASCII//TRANSLIT")
)

df %>% 
  mutate(c1 = tidy_text(c1))
#>        c1
#> 1   ELEVE
#> 2    CAFE
#> 3 CHATEAU
#> 4    NOEL
#> 5   CREME

Or using Mark's tidyverse code and purrr formula/function syntax:

tidy_text2 <- compose(
  str_squish,
  ~ str_remove_all(.x, "\""),
  str_to_upper,
  ~ iconv(.x, to = "ASCII//TRANSLIT")
)

df %>%
  mutate(c1 = tidy_text2(c1))
#>        c1
#> 1   ELEVE
#> 2    CAFE
#> 3 CHATEAU
#> 4    NOEL
#> 5   CREME

May not be necessary if you're only using it once of course! Just one way of having some bits tidier than others!