Spliting string into a list of substrings

109 views Asked by At

I have a string id <- "Hello these are words N12345678 hooray how fun".

I would like to extract just N12345678 from this string.

So far I have used strsplit(id, " "). Now I have

>id
>[[1]]
>[1] "Hello" "these" "are" "words" "N12345678" "hooray" "how"
>[8] "fun"

Which is of type list and of length 1 (despite apparently having 8 elements?)

If I then use id <- id[grep("^[N][0-9]",id)], id is an empty list.

I think what I need to do is split the string into a list of length 8 with each element as a substring and then grep should be able to pick out the pattern, but I'm not sure how to go about that.

4

There are 4 answers

0
Mehdi On BEST ANSWER

If you insist on using strsplit. I think this can solve the problem:

id <- "Hello these are words N12345678 hooray how fun"
id = strsplit(id, " ")
id[[1]][grep("^N[1-9]", id[[1]])]

Notice that I haven't changed your regex. It could be more precise expression such as ^N\\d+$.

0
Jilber Urbina On

Use regmatches

> regmatches(id, regexpr("N[0-9]+", id))
[1] "N12345678"
0
codingMonster17 On

Do you know about strtok? It will parse your input line on certain characters. For the purpose of my example, I am breaking off a piece of my string every time I hit a space.

tempVar = strtok(string, " ");
// tempVar has "id" or everything up to the first space
while (tempVar != NULL)
{
     tempVar = strtok(NULL, " ");
     //now tempVar picked up the next word, and will loop picking up the next word until the end of string
}

Using this, your "Hello these are words N123456789 Hooray" would do this: tempVar would be Hello, then "these" etc etc.

Each time through the loop tempVar would get a new value. So i would suggest evaluating tempVar in the loop (before grabbing the next one) so that you can stop when you have N123456789

0
Shenglin Chen On

Try:

gsub('\\b[a-zA-Z]+\\b','',id)