Split string on [:punct:] except for underscore in R

623 views Asked by At

I have an equation as a string where the variables in the string equation are variables in the R workspace. I would like to replace each variable with its numeric value in the R workspace. This is easy enough when the variable names don't contain punctuation.

Here is a simple example.

x <- 5
y <- 10
yy <- 15
z <- x*(y + yy)
zAsChar <- "z=x*(y+yy)"
vars <- unlist(strsplit(zAsChar, "[[:punct:]]"))
notVars <- unlist(strsplit(zAsChar, "[^[:punct:]]"))
varsValues <- sapply(vars[vars != ""], FUN=function(aaa) get(aaa))
notVarsValues <- notVars[notVars != ""]
paste(paste0(varsValues, notVarsValues), collapse="")

This yields "125=5*(10+15)", which is great.

However, I would love the option to use underscores in the variable names so that I can use "subscripts" for variable names. I am using these strings in math mode in R markdown.

So I need a [:punct:] that excludes _. I tried using [\\+\\-\\*\\/\\(\\)\\=] rather than [:punct:], but with this approach I couldn't split on the minus sign. Is there a way to preserve the _?

2

There are 2 answers

1
Casimir et Hippolyte On

Instead of [:punct:] use the unicode character class \pP (shortcut for \p{P}) and its negation \PP to do that:

[^\\PP_]

(It works with perl=TRUE option)

1
MrFlick On

Are you sure you need to do all this string manipulation? The substitute() function can help you out

substitute(z==x*(y+yy), list(x=x, y=y, yy=yy,z=z))

Or if you really need to start with a character value

do.call("substitute", list(parse(text=zAsChar)[[1]],list(x=x, y=y, yy=yy,z=z)))
# 125 = 5 * (10 + 15)

You can deparse() the result to turn it back into a character.