build a function calculating a value with dynamic variable with quotation in R

51 views Asked by At

I have data with many variables. The sample has only var1 and var2 but I have var1 ~ var50

sample_dta = data.frame(group_id = c(rep(1,3),rep(2,3)),
                        var1.x = seq(1,6),
                        var1.y = seq(21,26),
                        var2.x = seq(31,36),
                        var2.y = seq(41,46))

With the data, I have a function like this. I am trying to get a by choosing two columns (point1 and point2)

sample_fun = function(dta,width = 0,height = 0,point1,point2){
  if(width == 1){
    point1 = paste0("var",point1,".x")
    point2 = paste0("var",point2,".x")
  }
  
  if(height == 1){
    point1 = paste0("var",point1,".y")
    point2 = paste0("var",point2,".y")
  }
  
  dta %>% group_by(group_id) %>%
    summarise(a = mean(abs(!!point1 - !!point2),na.rm = T))
  
}

With this function I am trying to run sample_fun(dta,width = 1,point1 = 1,point2 = 2) but it doesn't work. I couldn't find where the problem is.

2

There are 2 answers

2
akrun On

As we are creating a string inside the function for column names, we need to convert to symbol and evaluate

sample_fun = function(dta,width = 0,height = 0,point1,point2){
  if(width == 1){
    point1 = paste0("var",point1,".x")
    point2 = paste0("var",point2,".x")
  }
  
  if(height == 1){
    point1 = paste0("var",point1,".y")
    point2 = paste0("var",point2,".y")
  }
  
  dta %>% 
          group_by(group_id) %>% 
          summarise(a = mean(abs(!! rlang::sym(point1) -
           !! rlang::sym(point2)),na.rm = TRUE), .groups = 'drop')
  
}

-checking

sample_fun(sample_dta,width = 1,point1 = 1,point2 = 2)
# A tibble: 2 x 2
#  group_id     a
#     <dbl> <dbl>
#1        1    30
#2        2    30

If there are many pairs of variables, a better option may be either map or pivot to long format and do the subtraction once

0
Ronak Shah On

You can use the .data pronoun to get the values of the column generated by point1 and point2.

library(dplyr)

sample_fun = function(dta,width = 0,height = 0,point1,point2){
  if(width == 1){
    point1 = paste0("var",point1,".x")
    point2 = paste0("var",point2,".x")
  }
  
  if(height == 1){
    point1 = paste0("var",point1,".y")
    point2 = paste0("var",point2,".y")
  }
  dta %>%
    group_by(group_id) %>%
    summarise(a = mean(abs(.data[[point1]] - .data[[point2]]),na.rm = TRUE))
  
}

sample_fun(sample_dta,width = 1,point1 = 1,point2 = 2)

#  group_id     a
#     <dbl> <dbl>
#1        1    30
#2        2    30