Split column by separator and delete values contained in other values

60 views Asked by At

I have a category column that is separated by ";". I.E Value:

value <- "A > B > C; A > B > D; A > B > C > C1"

It means:

The current product belongs to category "A > B > C", to category "A > B > D" and to category "A > B > C > C1"

If a category is already contained in another, this should be removed. So the goal is:

expectedResult <- "A > B > D; A > B > C > C1"

because "A > B > C > C1" is containing "A > B > C".

How can I solve this?

Note: I know that there are hundreds of questions that seem similar. But I just couldn't find a solution.

2

There are 2 answers

2
Sirius On

This ought to work:


value <- "A > B > C; A > B > D; A > B > C > C1"
els <- strsplit( value, "; " )[[1]]

my_reducer  <- function(a,b) {
    v <- str_detect( b, fixed(a) )
    a <- a[!v]
    append(a,b)
}

paste( Reduce( my_reducer, els ), collapse="; " )

Output:


> Reduce( my_reducer, els )
[1] "A > B > D; A > B > C > C1"

0
ThomasIsCoding On

Perhaps you can try the code below

v <- unlist(strsplit(value, ";\\s+"))
idx <- colSums(`diag<-`(sapply(v, function(x) {
  p <- gsub(x, "", v, fix = TRUE)
  p != v & nchar(p) > 0
}), FALSE)) == 0
paste0(names(idx)[idx], collapse = "; ")

which gives

[1] "A > B > D; A > B > C > C1"