split a string at multiple separators in r across multiple rows

308 views Asked by At

I have the following tibble in R:

df <- tibble(desc=c("test1", "test2", "test3", "test4","test1"), code=c("X00.2", "Y10", "X20.234", "Z10", "Q23.2"))

I want to create a new dataframe as:

df <- tibble(desc=c("test1", "test1", "test2", "test3", "test3", "test3", "test3", "test4", "test1", "test1"), code=c("X00", "X00.2", "Y10", "X20", "X20.2", "X20.23", "X20.234", "Z10", "Q23", "Q23.2"))

How would I do this? I think I can do it with separate_rows in dplyr by manipulating the separator but not exactly sure.

Thank you in advance.

1

There are 1 answers

5
Ronak Shah On BEST ANSWER

Here is one way using tidyverse functions.

library(tidyverse)

df %>%
  #n is the number of new rows to add
  mutate(n = nchar(sub('.*\\.', '', code)) + 1, 
         #l is location of "."
         l = str_locate(code, '\\.')[, 1], 
         #replace NA with 1
         n = replace(n, is.na(l), 1),
         l = ifelse(is.na(l), nchar(code), l), 
         r = row_number()) %>%
  #Repeat each row n times
  uncount(n) %>%
  #For each desc
  group_by(r) %>%
  #Create code value incrementing one character at a time
  mutate(code = map_chr(row_number(), ~substr(first(code), 1, l + .x - 1)), 
         #Remove "." which is present at the end of string
         code = sub('\\.$', '', code)) %>%
  ungroup %>%
  select(-l, -r)

This returns

# A tibble: 10 x 2
#   desc  code   
#   <chr> <chr>  
# 1 test1 X00    
# 2 test1 X00.2  
# 3 test2 Y10    
# 4 test3 X20    
# 5 test3 X20.2  
# 6 test3 X20.23 
# 7 test3 X20.234
# 8 test4 Z10    
# 9 test1 Q23    
#10 test1 Q23.2