Removing second or third occurrence of a pattern from a string

51 views Asked by At

I want to remove a specific letter from a string.

removing the first A works with str_remove(). But now I want to remove only the second and only the third A. Is there a function for that or any suggestions?

I would want to have these results:

GTGAGGA GTAGGGA GTAGAGG

Thanks a lot for any help!

x <- c("GTAGAGGA") str_remove(x, "A")

3

There are 3 answers

0
gabalz On

Here is one way:

pos = str_locate_all(x, "A")[[1]]
for (row in 1:nrow(pos)) {
    y <- x
    str_sub(y, pos[row, 1], pos[row, 2]) <- ""
    print(y)
}

which prints:

"GTGAGGA"
"GTAGGGA"
"GTAGAGG"
0
Ben On

This might be a generalizable function you can use. You can call the function containing a character string, a character to count occurrences (such as "A" in above example) within the string, and n or the nth occurrence to remove.

The pattern in the sub call includes matching n - 1 occurrences of the character, followed by zero or more text, and then followed by the character to remove.

rm_char <- function(the_str, the_char, n) {
  sub(
    paste0("((?:", the_char, ".*?){", n - 1, "})", the_char), 
    "\\1", 
    the_str, 
    perl = TRUE
  )
}

rm_char("GTAGAGGA", "A", 1)
#> [1] "GTGAGGA"
rm_char("GTAGAGGA", "A", 2)
#> [1] "GTAGGGA"
rm_char("GTAGAGGA", "A", 3)
#> [1] "GTAGAGG"

Created on 2024-02-15 with reprex v2.0.2

0
Adriano Mello On

Just another take with purrr inside a tibble. The regex answer (@Ben's) is the best.

# library(tidyverse)

# ------------------
# The strings
my_strings <- tribble(
                    ~string,
  "GTGAGGA GTAGGGA GTAGAGG",
  "GTAGAGG GTGAGGA GTAGGGA",
  "GTAGGGA GTAGAGG GTGAGGA")

# Occurrences to remove
ctrl <- c(2, 3)

# ------------------
my_strings <- mutate(
  my_strings,
  new_string = str_split(string, ""),

  split = map(new_string, \(x) if_else(x == "A", 1, 0)),
  split = map(split, \(x) if_else(x == 1, cumsum(x), x)),
  split = map(split, \(x) x %in% ctrl == FALSE),

  new_string = map2_chr(new_string, split, \(x, y) str_flatten(x[y])))

my_strings <- select(my_strings, - split)


# ------------------
> my_strings
# A tibble: 3 × 2
  string                  new_string           
  <chr>                   <chr>             
1 GTGAGGA GTAGGGA GTAGAGG GTGAGG GTGGGA GTAGAGG
2 GTAGAGG GTGAGGA GTAGGGA GTAGGG GTGGGA GTAGGGA
3 GTAGGGA GTAGAGG GTGAGGA GTAGGG GTGAGG GTGAGGA