I want to extract all substrings that begin with M and are terminated by a *
The string below as an example;
vec<-c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ")
Would ideally return;
MGMTPRLGLESLLE
MTPRLGLESLLE
I have tried the code below;
regmatches(vec, gregexpr('(?<=M).*?(?=\\*)', vec, perl=T))[[1]]
but this drops the first M and only returns the first string rather than all substrings within.
"GMTPRLGLESLLE"
You can use
See the regex demo. Details:
(?=
- start of a positive lookahead that matches a location that is immediately followed with:(M[^*]*)
- Group 1:M
, zero or more chars other than a*
char\*
- a*
char)
- end of the lookahead.See the R demo:
If you prefer a base R solution: