calculating bray curtis dissimilarity between each treatment to control within each group

404 views Asked by At

I need to calculate bray curtis dissimilarity of two treatments to a control. but this is nested by site. if I use the entire dataframe, the bray curtis dissmilarity takes into account distances between sites, which I need to avoid. below is a mock data:

library(vegan)

Site = c("A", "A", "A", "B", "B", "B", "C", "C", "C") 
Treatment = c('control', 'treatment1', 'treatment2', 'control', 'treatment1', 'treatment2','control', 'treatment1', 'treatment2') 
Sp1 = c(56, 42, 67, 23, 44, 21, 15, 20, 12) 
Sp2 = c(15, 10, 17, 1, 5, 2, 3, 1,6)
Sp3 = c(10, 6, 7, 10, 5, 4, 0, 1, 0)
Sp4 = c(9, 6, 4, 8, 13, 5, 2, 1, 0)
df = data.frame(Site, Treatment, Sp1, Sp2, Sp3, Sp4)

my ideal ouput would be the same dataframe with an extra column, with the dissimilarity to control. so obviously the distance of control to control would be 0 (or NA, doesn't matter). I am using the following command to calculate the bray curtis dissimilarity:

matrix <- df[,3:6]
braycurtis = vegdist(matrix, "bray")

below is an example of how it would look like (numbers in new column are fake, not the real output):

Site = c("A", "A", "A", "B", "B", "B", "C", "C", "C") 
Treatment = c('control', 'treatment1', 'treatment2', 'control', 'treatment1', 'treatment2','control', 'treatment1', 'treatment2') 
Sp1 = c(56, 42, 67, 23, 44, 21, 15, 20, 12) 
Sp2 = c(15, 10, 17, 1, 5, 2, 3, 1,6)
Sp3 = c(10, 6, 7, 10, 5, 4, 0, 1, 0)
Sp4 = c(9, 6, 4, 8, 13, 5, 2, 1, 0)
df = data.frame(Site, Treatment, Sp1, Sp2, Sp3, Sp4)
df$dis.to.control=c(0,0.2,0.7,0,0.4,0.6,0,0.6,0.0)

any help would be most welcomed!

I don't know how to go about it so the only thing that I have tried is manually splitting my data by site, and manually calculating the distances. but my real data has over 15 treatments and 20 sites. so it would be very time consuming doing it this way

1

There are 1 answers

0
pookpash On

I am not too familiar with ecology, so please double check the results but the code below seems to be doing what you want.

library(vegan)

Site <- c("A", "A", "A", "B", "B", "B", "C", "C", "C") 
Treatment <- c('control', 'treatment1', 'treatment2', 'control', 'treatment1', 
               'treatment2','control', 'treatment1', 'treatment2') 
Sp1 <- c(56, 42, 67, 23, 44, 21, 15, 20, 12) 
Sp2 <- c(15, 10, 17, 1, 5, 2, 3, 1,6)
Sp3 <- c(10, 6, 7, 10, 5, 4, 0, 1, 0)
Sp4 <- c(9, 6, 4, 8, 13, 5, 2, 1, 0)
df <- data.frame(Site, Treatment, Sp1, Sp2, Sp3, Sp4)

df$dis.to.control <- NA

for(i in unique(df$Site)) {
  #create temporary df for each site to make it easier to retrieve specific distances
  t_df <- df[df$Site == i,]
  #calculate distance for control vs treatments.
  bray_t1 <- vegdist(t_df[t_df$Treatment %in% c("control", "treatment1"),3:6], "bray")
  bray_t2 <- vegdist(t_df[t_df$Treatment %in% c("control", "treatment2"),3:6], "bray")
  #replace distance in original dataframe
  df$dis.to.control[df$Site == i & df$Treatment == "treatment1"] <- bray_t1
  df$dis.to.control[df$Site == i & df$Treatment == "treatment2"] <- bray_t2
}

returning:

  Site  Treatment Sp1 Sp2 Sp3 Sp4 dis.to.control
1    A    control  56  15  10   9             NA
2    A treatment1  42  10   6   6      0.1688312
3    A treatment2  67  17   7   4      0.1135135
4    B    control  23   1  10   8             NA
5    B treatment1  44   5   5  13      0.3211009
6    B treatment2  21   2   4   5      0.1621622
7    C    control  15   3   0   2             NA
8    C treatment1  20   1   1   1      0.2093023
9    C treatment2  12   6   0   0      0.2105263

It is a bit more complicated than it needs to be based on your example as it explicitly looks for treatment names and sites. However, this is more general if your actual data is not ordered as clear as your example.

There is most likely a way of doing this in fewer lines of code so I am interested in other answers as well.