I have different character strings that look like this:
t <- c("probable linoleate 9S-lipoxygenase 5 [Malus domestica]", "PREDICTED: protein STRUBBELIG-RECEPTOR FAMILY 3 [Malus domestica]")
I want to remove the 'PREDICTED:' from the character string containing it.
My script looks like this:
t <- sapply(strsplit(t, split= ": ", fixed = TRUE), function(x) (x[2]))
But, this is the result: [1] NA "protein STRUBBELIG-RECEPTOR FAMILY 3 [Malus domestica]"
So, for some reason, it erased t[1], and correctly performed the operation on t[2]. I tried adding grep() to my string:
t <- sapply(strsplit(t, if(grep('^*.', t), split= ": " else t, fixed = TRUE), function(x) (x[2]))).
I also tried writing a loop:
for(i in t){
if(i == grep('PREDICTED', t[i]) split= ": " else t[i])
}
Any help is greatly appreciated. Thanks!
To remove the
PREDICTED:
word you may use a simple non-regexsub
:See the online R demo
If the word before the first colon can be any, use a regex solution:
See another demo. Here,
^[^:]*:\\s*
matches 0+ chars other than:
at the start of the string, then:
and then 0+ whitespaces (this is matched only once sincesub
is used, notgsub
.In both cases, the output is the same: