Extract a substring in R

446 views Asked by At
> ldata2[2]
    [1] "  \"pretty\": \"5:06 PM GMT on June 18, 2015\","
# Need to extract only the time information. In this case "5:06 PM GMT on June 18, 2015"
# My attempt
> time <- sub(".* :\"(.*)".*","\\1",ldata2[1])

This is the error message i get : Error: unexpected symbol in "time <- sub(".* :\"(.*)"." Help appreciated

2

There are 2 answers

1
cr1msonB1ade On BEST ANSWER

Your pattern does not match the string so nothing is replaced. Here is the correct pattern:

sub(".*: \"(.*)\".*","\\1",ldata[2])
1
Pierre L On
library(stringr)
str_match(x, ': \\"(.*)\\"')[2]
#[1] "5:06 PM GMT on June 18, 2015"

cat was used as reference in creating the regex pattern.

x <- "  \"pretty\": \"5:06 PM GMT on June 18, 2015\","
cat(x)
"pretty": "5:06 PM GMT on June 18, 2015",

The backslashes are gone. I don't even reference them in my regex. The pattern ': \\"(.*)\\"' starts with the colon, a space and one set of double quotes. The colon and space do not need special characters. The double quotes have special regex meaning so the set is escaped with two backslashes. Next the capture group and another escaped double quote set.

With sub:

sub('.*: \\"(.*)\\",', '\\1', x)
[1] "5:06 PM GMT on June 18, 2015"