R - Conditional Substr from dataframe

1.9k views Asked by At

I need to substr from a column based on start and end locations. The start and end locations are derived from a character search.

For example, a single column in Dataframe with 3 rows:

'Bond, Mr. :James'
'Woman, Mrs. :Wonder'
'Hood, Mr. :Robin'

Expected Answer in Column 2 is:

'Mr.'
'Mrs.'
'Mr.'

I want to extract all strings in between ',' and ':' for column 1.

3

There are 3 answers

0
lotus On BEST ANSWER

Try gsub(".*, | :.*", "", myvec)

0
josliber On

You can use the stringr package to perform common string operations like trimming, substrings, or extracting patterns:

library(stringr)
str_trim(str_sub(str_extract(x, ",[^:]*"), 2))
# [1] "Mr."  "Mrs." "Mr." 
2
Pierre L On

Also:

str_extract(x, 'Mr(s?).')

@akrun has a suggestion to help with more cases.

str_extract(myvec, '\\S+(?=\\s*:)')

By specifying non-space characters followed by one or more spaces and a colon, a variety of titles and honorifics will be captured by the regex pattern.