comparison of two sets when repeated in r

65 views Asked by At

I would like to compare two sets efficiently and using setdiff and intersect functions, but is not working the way I wanted to. I would like to compare the elements of two sets and see if which elements are different.

for example:

Aset = c("AAAAA", "AABBB", "AAABB", "BBBBB")
Bset = c("AAABB" ,"AAABB", "BBBAA", "BBBBB")

 # present in Aset but not in Bset 
setdiff(Aset, Bset)
[1] "AAAAA" "AABBB"

#present in Bset but not in Aset 
setdiff(Bset, Aset)
[1] "BBBAA"

# both in Aset in Bset 
intersect (Aset, Bset)
[1] "AAABB" "BBBBB"

However when I repeated values, this will consider this as one element (which is correct mathematically) but I want to see how many elements match without considering duplications.

Cset = c("AAAAA", "BBBBB", "AAABB", "BBBBB")
Dset = c("AAABB" ,"AAABB", "ABBBB", "BBBBB")

 # present in Aset but not in Bset 
setdiff(Cset, Dset)
[1] "AAAAA"

There is one more BBBB in set Cset over the Dset. So I am want a alternate function that can consider duplicated values and give something like this:

  [1] "AAAAA"  "BBBBB"

The intersect also show similar behavior (which correct by definition).

Eset = c("AAAAA", "BBBBB", "AAAAA", "BBBBB")
Fset = c("BBBBB" ,"AAAAA", "BBBBB", "AAAAA")

intersect (Eset, Fset) 
[1] "AAAAA" "BBBBB"

What I would like to see that all four are matching.

[1] "AAAAA" "BBBBB"  "AAAAA" "BBBBB"

Looking for alternate function - that fit my need ..

0

There are 0 answers