rpy2 Error: "unrecognized escape in character string"

1.8k views Asked by At

I have a chunk of code in R that I would like to insert in my python code. To that aim I am using rpy2. The R code involves many regular expressions and it seems that rpy2 is not handling them correctly or perhaps I am not coding them adequately.

Here is an example of a piece of code that words and another that does not work:

1) It works: A very trivial removeStopWords function:

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr

robjects.r('''
library(data.table)
library(tm)

removeStopWords <- function(x) gsub("  ", " ", removeWords(x, stopwords("english")))

''')

In [4]: r_f = robjects.r['removeStopWords']
In [5]: r_f('I want to dance')[0]
Out[5]: 'I want dance'

2) it does not work: an also trivial function to remove leading and trailing spaces:

robjects.r('''
library(data.table)
library(tm)

trim <- function (x) gsub("^\\s+|\\s+$", "", x)

''')

 Error: '\s' is an unrecognized escape in character string starting ""^\s"
p = rinterface.parse(string)
Abort

and the I am "expelled out" from IPython

I have tried directly:

import rpy2.rinterface as ri
exp = ri.parse('trim <- function (x) gsub("^\\s+|\\s+$", "", x)') 

but the result is the same, Abort and then out of IPython

At this stage I don't really know what to try. The R code is quite large so moving all from R to python would take me some time...and I would prefer not having to do such a thing.

Any help is much appreciated!

Thanks in advance for your time.

1

There are 1 answers

1
hajtos On BEST ANSWER

When you write \\ in a string in Python, it is stored as \ since \ is an escaping character. So when R executes the code, it sees "^\s+|\s+$". But \is also and escaping character in R and \s not recognized as any escaped character.

If you want R to recieve "^\\s+|\\s+$", you need to write "^\\\\s+|\\\\s+$" in Python(twice the number of backslashes).