Recode string in column based on the name of the column, looked up in another table

77 views Asked by At

I have a df containing values for several ids and multiple variables. There is a reserve code, "no", the same code for all variables, that I would like to recode to different values based on another df, reserves.

library(dplyr)
df <- tribble(~id, ~var1, ~var2, ~var3,
               10, "1.1", "no", "3.45",
               11, "2.3", "4.67", "5.00",
               12, "no", "no", "7.65",
               13, "7.8", "7.8", "no")

reserves <- tribble(~var, ~replacement,
                    "var1", "0.01",
                    "var2", "0.45",
                    "var3", "0.02")

I am looking for a tidy solution to this. I know I can do this manually, as in:

df$var1[data$var1=="no"] <- reserves$replacement[reserves$var==var1]
...

But I would prefer a lookup-style solution that I could pipe into the rest of my workflow. Below is my desired result.

df <- tribble(~id, ~var1, ~var2, ~var,
               10, "1.1", "0.45", "3.45",
               11, "2.3", "4.67", "5.00",
               12, "0.01", "0.45", "7.65",
               13, "7.8", "7.8", "0.02")

So, replacing the "no" values with values specified in another table. Can I get a hand with this?

4

There are 4 answers

0
jkatam On BEST ANSWER

Alternatively please try the below code

library(tidyverse)

df <- df %>% 
  pivot_longer(cols = starts_with('var'), names_to = 'var')


df %>% left_join(reserves, by=c('var')) %>% 
  mutate(value=ifelse(value=='no' & !is.na(replacement), replacement, value)) %>% 
  pivot_wider(id_cols = id, names_from = var, values_from = value)

Created on 2023-09-29 with reprex v2.0.2

# A tibble: 4 × 4
     id var1  var2  var  
  <dbl> <chr> <chr> <chr>
1    10 1.1   0.45  3.45 
2    11 2.3   4.67  5.00 
3    12 0.01  0.45  7.65 
4    13 7.8   7.8   no   
0
Bastián Olea Herrera On

You can use the ifelse function to perform something on a column if a certain condition is met, in this case, var1 == "no", then the following argument is what you want to re place the value with (in your case, the reserve data for said variable), and lastly, the value if the condition is not met, i.e, the same value (var1):

df |> 
  mutate(var1 = ifelse(var1 == "no", reserves |> filter(var == "var1") |> pull(replacement), var1))

Following this, we can create a function that does this for any column, not only for var1:

replace_no <- function(variable) {
  replacement_data <- reserves |> filter(var == rlang::as_string(rlang::ensym(variable))) |> pull(replacement)
  
  ifelse(variable == "no", replacement_data, variable)
}

The we just use out new function:

df |> 
  mutate(var1 = replace_no(var1),
         var2 = replace_no(var2))

Returning this:

# A tibble: 4 × 4
     id var1  var2  var  
  <dbl> <chr> <chr> <chr>
1    10 1.1   0.01  3.45 
2    11 2.3   4.67  5.00 
3    12 0.01  0.01  7.65 
4    13 7.8   7.8   no   
0
jpsmith On

In base R you can use sapply:

ccols <- c("var1", "var2", "var3")

df[ccols] <- sapply(ccols, \(x){
  df[df[,x] == "no", x] <- reserves[reserves$var == x, "replacement"]
  df[,x]
  })

Output:

#      id var1  var2  var3 
#   <dbl> <chr> <chr> <chr>
# 1    10 1.1   0.45  3.45 
# 2    11 2.3   4.67  5.00 
# 3    12 0.01  0.45  7.65 
# 4    13 7.8   7.8   0.02
5
GuedesBF On

A tidyverse(dplyr) solution with a simple mutate(across())

We can transform reserves into a named vector with deframe and use replace. For every(cur_column(), use reseves[cur_column()] for replacement. I corrected the typo like @jpsmith suggested.

library(dplyr)
library(tibble)

reserves <- deframe(reserves)

df |> mutate(across(matches('var\\d+'),
                    \(x) replace(x,
                                 x=='no',
                                 reserves[cur_column()]
                                 )
                    )
             )

# A tibble: 4 × 4
     id var1  var2  var3 
  <dbl> <chr> <chr> <chr>
1    10 1.1   0.45  3.45 
2    11 2.3   4.67  5.00 
3    12 0.01  0.45  7.65 
4    13 7.8   7.8   0.02