Drawing confidence bands around multiple lines on the same graph

1.3k views Asked by At

I have run Moran's I analysis which looks for spatial relationships among features. The analysis was done using the correlog function in the ncf R package and used the first 3 principal components generated from genetic data. the results of that analysis are shown below.

distance=c(2.806063,8.208133,14.03604,19.03151,24.44091, 2.806063, 8.208133,14.03604,19.03151,24.44091,2.806063,8.208133,14.03604,19.03151,24.44091 )    

correlation=c(-0.006933,0.029481,-0.071406,0.038319,-0.049990,0.006267,0.055945,-0.048551,-0.035062,-0.031578,0.022629,-0.065584,0.000986,-0.052754,0.0424931)
component=c(PC1,PC1,PC1,PC1,PC1,PC2,PC2,PC2,PC2,PC2,PC3,PC3,PC3,PC3,PC3)

data1<-data.frame(distance,correlation,component)

I then used ggplot to plot the results

library(ggplot2)
ggplot(data1,aes(x=data1$distance,y=data1$correlation,group=component,colour=component))+theme_classic()+ geom_line(size=1)+geom_point(size=1.5)

What I would now like to do is compute the 95% confidence intervals for each of the principal components, and draw that on the ggplots, using a faint shading for the confidence area around each line and keeping the different line colours representing the different PCs. Unfortunately, I am completely stuck and don't know how to go about doing this. Any help will be higly appreciated.

1

There are 1 answers

1
jlhoward On BEST ANSWER

You code doesn't run as is, which is why no one has bothered to respond for the last 10 hours.

Assuming you mean:

component=c("PC1","PC1","PC1","PC1","PC1","PC2","PC2","PC2","PC2","PC2","PC3","PC3","PC3","PC3","PC3")

and that you want the 95% CL for the correlation vs. distance, this will provide it:

library(ggplot2)
ggplot(data1,aes(x=distance,y=correlation,color=component))+
  geom_line(size=1)+
  geom_point(size=1.5)+
  stat_smooth(aes(fill=component), alpha=.2,
              method=lm, formula=y~1, se=TRUE, level=0.95)+
  theme_classic()

The main addition is the stat_smooth(...) line, which smooths the correlation vs. distance data using a linear model having only the constant term (so, the mean). Note that the default level=0.95 and the default se=TRUE so those clauses are not really necessary in this case.

Also, the expressions in the call to aes(...) should reference columns of the data1 (so x=distance, not x=data1$distance), and you do not need the group=... clause if color=... uses the same grouping variable.