Add breaks in ggplot2 x axis

45 views Asked by At

I am plotting coverage against position, where I have a very simple code like this:

positions <- 1:200000
coverage <- rep(0, length(positions))
coverage[1:200] <- 2000
coverage[30001:30100] <- 5000
coverage[50001:50100] <- 500
coverage[170001:170300] <- 500
cov <- data.frame(position = positions, coverage = coverage)


ggplot(data = cov, aes(x = position, y = coverage)) +
  geom_line() +
  xlab("Position") +
  ylab("Coverage") +
  ggtitle("Coverage vs. Position")

This leaves me with something looking like this:

enter image description here

The issue here is that the regions with high coverage are separated by very long tracks of zero coverage. I would like to shorten these regions so that the regions with coverage are visible. For example, cut the x axis when there have been more than 100 consecutive zeros. Is this possible? Thanks in advance!

2

There are 2 answers

0
Jon Spring On

My approach here is to take the +/- bandwidth average, create a "section" every time the average either becomes or stops being zero, filter out the boring zero sections, leaving views of the areas around the spikes.

library(dplyr)
bandwidth = 200
cov %>%
  mutate(avg = slider::slide_dbl(coverage, mean, 
                                 .before = bandwidth, .after = bandwidth)) %>%
  mutate(section = cumsum((avg > 0) != (lag(avg>0,1,0)))) %>%
  filter(avg != 0) %>%
  ggplot(aes(position, coverage)) +
  geom_line() +
  facet_wrap(~section, scales = "free_x")

enter image description here

1
akash87 On

This was a bit of a doozie!

Before you can plot it, you need to figure out how many 0s are in consecutive order.

library(tidyverse)
library(ggplot2)
library(ggbreak)

breakers <- breakers <- cov %>% 
mutate(gr = cumsum(coverage == 0), 
       gs = cumsum(coverage != lag(coverage, default = 0))) %>% 
group_by(gs, coverage) %>% 
summarise(min_pos = min(position), 
          max_pos = max(position), 
          min_gr = min(gr), 
          max_gr = max(gr)) %>% 
ungroup() %>% 
mutate(diff_pos = max_pos - min_pos, diff_gr = max_gr - min_gr) %>% 
filter(coverage == 0)

Then you can plot based on this. Let me be clear I could not find a way to programmatically iterate this.

ggplot(data = cov, aes(x = position, y = coverage)) +
geom_line() +
xlab("Position") +
ylab("Coverage") +
ggtitle("Coverage vs. Position") + 
scale_x_break(breaks = c(breakers$min_pos[1], breakers$max_pos[1]), scales = 'free') + 
scale_x_break(breaks = c(breakers$min_pos[2], breakers$max_pos[2]), scales = 'free') + 
scale_x_break(breaks = c(breakers$min_pos[3], breakers$max_pos[3]), scales = 'free') + 
scale_x_break(breaks = c(breakers$min_pos[4], breakers$max_pos[4]), scales = 'free')

The final graph will look like

enter image description here