my gratitude in advance for any help and apologies for not being able to figure this out from other examples.
I have a vector containing names of files such as: vec = c("Img_1_(set1)_2L4_s.ext", "Img_37_(set19)_2R4_s.ext", "Img_187_(set94)_4L4_s.ext", "Img_77_(set39)_4R2_s.ext")
I want to create two--separate--additional vectors from extracting:
1. The key letter (either L or R) between the numbers that go side-by-side, which vary from case to case. e.g., result: L,R,L,R
2. The "set" string, plus the number--which varies across cases--attached to it between brackets, with and without the brackets. e.g., result1: (set1), (set19), (set94), (set39); result2: set1, set19, set94, set39
Ideally using either stringer(), but I'm open to other --simpler?-- libraries/functions.
For case 1., I tried str_extract(vec, "(?<= \\)_)[0-9]*"), as a way to get the ")_" pattern followed by a number [0-9] but all I get in return are NAs (I think I'm not quite passing alright the ")" pattern well).
For case 2., I had to made do by simply extracting the set numbers str_extract(vec, "(?<=set)[0-9]*"), and create another variable by pasting the "set" word; obviously not ideal with large data frames.
The
setpattern is nice and easy, the letters"set"followed by one more more numbers"[0-9]+".At least for your examples, it seems like the letters L and R don't show up anywhere else, so we can do a very simple pattern for them too, just look for an L or an R:
"L|R".If you're worried about potentially getting false hits on the L or R because they might show up elsewhere in the input, you could make the pattern more specific, for example looking behind for a number
"(?<=[0-9])"and looking ahead for a number"(?=[0-9])":And if you do want the parens with the set, you escape parens to include them in the pattern: