I would like to transform this data:
Sample Genotype Region
sample1 A Region1
sample1 B Region1
sample1 A Region1
sample2 A Region1
sample2 A Region1
sample3 A Region1
sample4 B Region1
In that format, tagging with "E" samples with more than one genotype and unifying samples with the same genotype 2 times:
Sample Genotype Region
sample1 E Region1
sample2 A Region1
sample3 A Region1
sample4 B Region1
I have one list with many regions (Region1 - Regionx). It is possible to do in R software? Thanks a lot.
One straightforward approach is to use
aggregate
. Assuming yourdata.frame
is called "mydf" (and building on Jorg's comment):