Spliting string into a list of substrings

118 views Asked by At

I have a string id <- "Hello these are words N12345678 hooray how fun".

I would like to extract just N12345678 from this string.

So far I have used strsplit(id, " "). Now I have

>[1] "Hello" "these" "are" "words" "N12345678" "hooray" "how"
>[8] "fun"

Which is of type list and of length 1 (despite apparently having 8 elements?)

If I then use id <- id[grep("^[N][0-9]",id)], id is an empty list.

I think what I need to do is split the string into a list of length 8 with each element as a substring and then grep should be able to pick out the pattern, but I'm not sure how to go about that.


There are 4 answers


If you insist on using strsplit. I think this can solve the problem:

id <- "Hello these are words N12345678 hooray how fun"
id = strsplit(id, " ")
id[[1]][grep("^N[1-9]", id[[1]])]

Notice that I haven't changed your regex. It could be more precise expression such as ^N\\d+$.

Jilber Urbina On

Use regmatches

> regmatches(id, regexpr("N[0-9]+", id))
[1] "N12345678"
codingMonster17 On

Do you know about strtok? It will parse your input line on certain characters. For the purpose of my example, I am breaking off a piece of my string every time I hit a space.

tempVar = strtok(string, " ");
// tempVar has "id" or everything up to the first space
while (tempVar != NULL)
     tempVar = strtok(NULL, " ");
     //now tempVar picked up the next word, and will loop picking up the next word until the end of string

Using this, your "Hello these are words N123456789 Hooray" would do this: tempVar would be Hello, then "these" etc etc.

Each time through the loop tempVar would get a new value. So i would suggest evaluating tempVar in the loop (before grabbing the next one) so that you can stop when you have N123456789

Shenglin Chen On

