I have some very large files that contain a genomic position (position) and a corresponding population genetic statistic (value). I have successfully plotted these values and would like to color code the top 5% (blue) and 1% (red) of values. I am wondering if there is an easy way to do this in R.
I have explored writing a function that defines the quantiles, however, many of them end up being not unique and thus cause the function to fail. I've also looked into stat_quantile but only had success in using this to plot a line marking the 95% and 99% (and some of the lines were diagonals which did not make any sense to me.) (Sorry, I am new to R.)
Any help would be much appreciated.
Here is my code: (The files are very large)
########Combine data from multiple files
fst <- rbind(data.frame(key="a1-a3", position=a1.3$V2, value=a1.3$V3), data.frame(key="a1-a2", position=a1.2$V2, value=a1.2$V3), data.frame(key="a2-a3", position=a2.3$V2, value=a2.3$V3), data.frame(key="b1-b2", position=b1.2$V2, value=b1.2$V3), data.frame(key="c1-c2", position=c1.2$V2, value=c1.2$V3))
########the plot
theme_set(theme_bw(base_size = 16))
p1 <- ggplot(fst, aes(x=position, y=value)) +
geom_point() +
facet_wrap(~key) +
ylab("Fst") +
xlab("Genomic Position (Mb)") +
scale_x_continuous(breaks=c(1e+06, 2e+06, 3e+06, 4e+06), labels=c("1", "2", "3", "4")) +
scale_y_continuous(limits=c(0,1)) +
theme(plot.background = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
legend.position="none",
legend.title = element_blank()
)
p1
This is how I would approach it - basically creating a factor defining which group each observation is in, then mapping
colour
to that factor.First, some data to work with!
There are plenty of ways of getting a factor to determine which group each observation falls into, here is one:
So
quant
now indicates whether an observation is in the 95-99 or 99+ group. The colour of the points in a plot can then easily be mapped toquant
.