Extracting different parts of string from dataframe

50 views Asked by At

I have a data frame df in the following form: The element data type is character.

 Well and Depth  
   Black Peak 1000
   Black Peak 1001
   Black Peak 1002
   Black Peak 10150
   Black Peak 10151  

I'd like to extract this data into two parts: The first being the last number in the string, and all the text before the space right in front of the number. Also, when the number is extracted, how would I be able to turn the character into a usable integer? I intend on taking the extracted data and leaving it in the data frame. It would look like the following after completion:

  Well           Depth   
   Black Peak     1000
   Black Peak     1001
   Black Peak     1002
   Black Peak     10150
   Black Peak     10151  

The two lists above would be two columns in the data frame df

2

There are 2 answers

1
AntoniosK On BEST ANSWER

Data

# example dataset
df = data.frame(v = c("Black Peak 1000", "Black Peak 1001", "Black Peak 1002", 
                      "Black Peak 10150", "Black Peak 10151"), stringsAsFactors = F)

Using base R

# split by last space, bind rows and save it as dataframe
df2 = data.frame(do.call(rbind, strsplit(df$v, ' (?=[^ ]+$)', perl=TRUE)), stringsAsFactors = F)

# set names
names(df2) = c("Well", "Depth")

# update to numeric 
df2$Depth = as.numeric(df2$Depth)

df2

#         Well Depth
# 1 Black Peak  1000
# 2 Black Peak  1001
# 3 Black Peak  1002
# 4 Black Peak 10150
# 5 Black Peak 10151

Or using a tidyverse approach

library(tidyverse)

df %>% 
  separate(v, sep = ' (?=[^ ]+$)', into = c("Well","Depth")) %>%
  mutate(Depth = as.numeric(Depth))

#         Well Depth
# 1 Black Peak  1000
# 2 Black Peak  1001
# 3 Black Peak  1002
# 4 Black Peak 10150
# 5 Black Peak 10151
0
lisah On

Try str_split() from stringr (https://www.rdocumentation.org/packages/stringr/versions/1.1.0/topics/str_split) and then convert the second column to numeric with, e.g., as.numeric().