R Regex: Parenthesis Not Acting as Metacharacter

173 views Asked by At

I am trying to split a string by the group "%in%" and the character "@". All documentation and everything I can find says that parenthesis are metacharacters used for grouping in R regex. So the code

    > strsplit('example%in%aa(bbb)aa@cdef', '[(%in%)@]', perl=TRUE)

SHOULD give me

    [[1]]
    [1] "example" "aa(bbb)aa"      "cdef"

That is, it should leave the parentheses in "aa(bbb)aa" alone, because the parentheses in the matching expression are not escaped. But instead it ACTUALLY gives me

    [[1]]
    [1] "example" ""   ""    ""    "aa"    "bbb"   "aa"    "cdef"

as if the parentheses were not metacharacters! What is up with this and how can I fix it? Thanks!

This is true with and without the argument perl=TRUE in strsplit.

3

There are 3 answers

0
Joshua Ulrich On BEST ANSWER

Not sure what documentation you're reading, but the Extended Regular Expressions section in ?regex says:

Most metacharacters lose their special meaning inside a character class. ... (Only '^ - \ ]' are special inside character classes.)

You don't need to create a character class. Just use "or" | (you likely don't need to group "%in%" either, but it shouldn't hurt anything):

> strsplit('example%in%aa(bbb)aa@cdef', '(%in%)|@', perl=TRUE)
[[1]]
[1] "example"   "aa(bbb)aa" "cdef"
0
nhahtdh On

Inside character class [], most of the characters lose their special meaning, including ().

You might want this regex instead:

'%in%|@'
2
agstudy On

No need to use [ or ( here , just this :

strsplit('example%in%aa(bbb)aa@cdef', '%in%|@')
[[1]]
[1] "example"   "aa(bbb)aa" "cdef"