Dynamically create dataframe and variable names in R similar to macros in SAS and STATA

1.2k views Asked by At

I've been trying to find a solution for this for over a year already and decided to write a post about it. Any help would be appreciated. Here is the pseudocode that I can do easily in Stata and SAS but I don't know how to do in R. {} is the glue-like operator that was introduced into dplyr this year, so I'm using it as a placeholder for the operator that makes the pseudocode work.

library(tidyverse)
var <- "mpg"
df_name <- "mtcars"

{df_name} %>% count({var})

{df_name}_1 <- {df_name} %>% mutate(., {var}_1={var}/2)

length({df_name}_1)

should lead to

library(tidyverse)
var <- "mpg"
df_name <- "mtcars"

mtcars %>% count(mpg)

mtcars_1 <- mtcars %>% mutate(., mpg_1=mpg/2)

length(mtcars_1)

In Stata, I can easily do with local or global macros like this:

local df_name "mtcars"

then reference it as `df_name'

In SAS I can do it with global macros like this:

 %LET df_name=mtcars;

then reference it like &df_name.

Please note how visually easy it is to reference these values -- no assigns, gets, parentheses, mgets, etc .

Both approaches allow to use them in dataset names, functions, variables, etc. Simplifies my code tremendously and saves me tons of time. How to do this with visual simplicity in R? My code should be readable for people familiar with Stata/SAS (dplyr is awesome in this regard!) and too many evals, wrapping everything in functions, assigns with parentheses will just make them give up on the project or force me to change back to SAS/Stata.

I tried all combinations of {{}}, !!, enquo, sym, and the NSE and still don't know how to make this work in a visually simple way. In dplyr pipes, there is finally some workaround for the variable names but nothing for the dataframes and base R.

I would really appreciate any help in this matter! I had this problem back in 2009 with R and gave up on R until I had to come back in 2019 and still can't find an easy way to approach this.

2

There are 2 answers

1
IRTFM On

I do not see that needing to use get in R in place of needing to use LET in SA is any more difficult. The dplyr mechanism for turning named character values into R names or "language"-expressions is the !! operator, so this seems to be an exact implementation of the first part of your request:

 library(tidyverse)
 var <- "mpg"
 df_name <- get("mtcars")  # this will create a separate copy
 
 
 df_name %>% count(!!var)
#  "mpg"  n
#1   mpg 32

You will need to use an assign operation for the next parts, though. I don't think dplyr has gone the route of trying to emulate the macro character of SAS which most experienced R users would consider as an abomination. There was an effort about 10 years ago on the part of one of the R-greats to illustrate how R could do macro-processing, but I think he did it more as to show why it should not be done than with any intent to make R more SAS-like. (I'll now go out and look for the article.)

Here's the document. It is closer to 20 years ago. Go to page 11.

1
MrFlick On

R is a functional programming language. You would use meta-programming to create code rather than using macro style text replacement like SAS to create new features. Variable names in R are symbols, not strings. Trying to force these two different programming paradigms to look the same is generally not a good idea and leads to very un-idiomatic in one of the two languages.

With R it's better to track the data itself rather than the name of the data and track the column you want to use with a symbol.

library(tidyverse)

df <- mtcars
var <- rlang::sym("mpg")

Then you can just do

df %>% count(!!var)

or create a function

get_counts <- function(data, x) {
    data %>% count({{x}})
}
get_counts(df, mpg)   #use actual column name
get_counts(df, !!var) #or use name from variable with !!

You can name new variables using := and some glue style sytnax with dplyr

df %>% mutate(., "{var}_1" := !!var/2)

which you could also put into a function

create_new_var <- function(data, x) {
  data %>% mutate(., "{{x}}_1":={{x}}/2)
}
create_new_var(df, mpg)
create_new_var(df, !!var)

And then rather than creating variables with numeric indexes in their name, you would just use pipes

df %>% 
   create_new_var(!!var) %>%
   length()

Or if you have multiple values to work with, you keep values in named lists. And then you can map functions over the lists. For eample

df_name <- "mtcars"
data <- mget(df_name, inherits = TRUE)
fixed <- map(data, ~create_new_var(., !!var))
lens <- map(fixed, ~length(.))
lens$mtcars
# [1] 12
lens[[df_name]]
# [1] 12

This would also work if you had multiple data.frames stored in the data list

I would strongly discourage you from using something like this, but you could define a new type of assignment that would allow for string-line variable names. For example

`%<-%` <-  function(x, value) {
  varname <- glue::glue(x, .envir = parent.frame())
  invisible(assign(varname, value, envir = parent.frame()))
}

df_name <- "mtcars"
"{df_name}_1" %<-% { get(df_name) %>% create_new_var(!!var) }

Here we defined %<-% rather than <- that will take string-like names and expand them and turn them into new variable. And we use get() to get values for variables with strings. Note that we need {} around the expression we want to assign to the new value because we can't control the precedence of our new operator. It has the same precedence as the pipe operator so everything would go left to right by default. This is not something a proper R programming would likely do though.