Covert many lines in a specific line

62 views Asked by At

I would like to transform this data:

    Sample  Genotype  Region
    sample1    A      Region1
    sample1    B      Region1
    sample1    A      Region1
    sample2    A      Region1
    sample2    A      Region1
    sample3    A      Region1
    sample4    B      Region1

In that format, tagging with "E" samples with more than one genotype and unifying samples with the same genotype 2 times:

    Sample  Genotype  Region   
    sample1    E      Region1
    sample2    A      Region1
    sample3    A      Region1
    sample4    B      Region1

I have one list with many regions (Region1 - Regionx). It is possible to do in R software? Thanks a lot.

1

There are 1 answers

0
A5C1D2H2I1M1N2O1R2T1 On

One straightforward approach is to use aggregate. Assuming your data.frame is called "mydf" (and building on Jorg's comment):

aggregate(Genotype ~ ., mydf, function(x) {
  a = unique(x)
  ifelse(length(a) > 1, "E", a) 
})
#    Sample  Region Genotype
# 1 sample1 Region1        E
# 2 sample2 Region1        A
# 3 sample3 Region1        A
# 4 sample4 Region1        B