Replace parts of string using package stringi (regex)

918 views Asked by At

I have some string

string <- "abbccc"

I want to replace the chains of the same letter to just one letter and number of occurance of this letter. So I want to have something like this: "ab2c3"

I use stringi package to do this, but it doesn't work exactly like I want. Let's say I already have vector with parts for replacement:

vector <- c("b2", "c3")
stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector)

The output:

[1] "ab2b2" "ac3c3"

The output I want: [1] "ab2c3"

I also tried this way

stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector, vectorize_all=FALSE)

but i get error

Error in stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector, vectorize_all = FALSE) : 
  vector length not consistent with other arguments
2

There are 2 answers

0
Tyler Rinker On BEST ANSWER

Not regex but astrsplit and rle with some paste magic:

string <- c("abbccc", "bbaccc", "uffff", "aaabccccddd")

sapply(lapply(strsplit(string, ""), rle), function(x) {
    paste(x[[2]], ifelse(x[[1]] == 1, "", x[[1]]), sep="", collapse="")
})

## [1] "ab2c3"   "b2ac3"   "uf4"     "a3bc4d3"
0
nicola On

Not a stringi solution and not a regex either, but you can do it by splitting the string and using rle:

    string <- "abbccc"
    res<-paste(collapse="",do.call(paste0,rle(strsplit(string,"",fixed=TRUE)[[1]])[2:1]))
    gsub("1","",res)
    #[1] "ab2c3"