How to add labels to my volcano plot in R

210 views Asked by At

I am trying to create a volcano plot using R to show differentially expressed genes. I am trying to label the top 10 most significantly different genes using ggrepel with the gene_names from a the original dataframe ('dat'). My code so far looks like this:

dat <- read_tsv ('/Users/user/Library/CloudStorage/OneDrive-TheUniversityofManchester/Uni/PhD/Studies/JM8 RNAseq stroke study/RNAseq data/JM_RNASeq_Expressionfiltered.tsv')

##### START HERE FOR EACH PLOT #####

## 4h vs sham##

## loop through for each plot
## Order by padj value for desired comparison (important to do this here for labelling, and re-order for each plot)
dat  <- dat[order(dat$JM_4h_vs_sham_padj.A), ]

## obtain a vector describing significance (padj < 0.05, log2FC -0.5 < x < 0,5)
## Also directionality of change (- or +)
## new data frame called sigs containing the log2FC and padj data from dat

sigs = data.frame(dat$JM_4h_vs_sham_log2FoldChange.A)
sigs$padj <- dat$JM_4h_vs_sham_padj.A
colnames(sigs) <- c("Log2FC", "padj")


## add new column describing direction of change
## Change nonsignificant genes to "NS"

sigs$dir <- ifelse(sigs$Log2FC > 0, "Up", "Down")
sigs$dir <- ifelse(sigs$padj < 0.05, sigs$dir, "NS")
sigs$dir <- ifelse(sigs$Log2FC > -0.5 & sigs$Log2FC < 0.5, "NS", sigs$dir)

## to label the top 10 genes, create a logical column where the genes to label are TRUE

sigs$lab <- FALSE
sigs[1:10,4] <- TRUE

## plot the volcano plot
## this works well for a panel 45mm wide - export as 5x5 inch PDF and rescale in illustrator
ggplot(data = sigs, aes(x = Log2FC, y = -log10(padj))) +
  geom_point(aes(colour = dir), size = 0.5) +
  geom_text_repel(aes(label = ifelse(sigs$lab == T, as.character(dat$gene_name),"")), size = 5) +
  scale_colour_manual(values = c("#101d99", "grey80", "#c144a1")) +
  theme(panel.background = element_rect(fill = "white", colour = "black", size = 0.5, linetype = "solid"),
        panel.border = element_rect(fill = NA, colour = "black", size = 0.5),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.title.y = element_text(size = 20),
        axis.title.x = element_text(size = 20),
        axis.text.y = element_text(size = 20),
        axis.text.x = element_text(size = 20),
        legend.position = "none") +
  scale_x_continuous(limits = c(-2, 5), breaks = seq(-2, 5, 1)) +
  scale_y_continuous(limits = c(0, 15), breaks = seq(0, 15, 5))

ggsave("Vplot_4h_vs_sham.pdf", plot = last_plot(), 
       path = "/Users/user/Library/CloudStorage/OneDrive-TheUniversityofManchester/Uni/PhD/Studies/JM8 RNAseq stroke study/RNAseq plots",
       scale = 1, width = 5, height = 5, units = c("in"), dpi = 600, limitsize = TRUE)

However the labels to not appear in my plot (see image)

Volcano plot example

My dataframe 'sigs' looks like this:

 Log2FC         padj dir  lab
1 2.324442 9.878011e-67  Up TRUE
2 4.754250 1.389851e-54  Up TRUE
3 5.213439 4.860466e-49  Up TRUE
4 5.382307 1.127434e-48  Up TRUE
5 1.342379 1.194506e-35  Up TRUE
6 2.468398 2.070574e-35  Up TRUE

I have a column for gene_name in my original dataframe 'dat'.

I have tried using the following solution I found elsewhere on stackoverflow, after adding a column for gene_name to the sigs dataframe but the code didn't seem to like the NULL value.

# Identify the top 10 hits
Top_Hits = head(arrange(dat,padj),10) 

# Add column label, containing the gene name for the top hits or nothing for all others
dat$label = if_else(dat$gene_name %in% Top_Hits$gene_name,  
                   dat$gene_name, NULL)
   # use geom_text_repel to show labels (while preventing overlap)
  ggrepel::geom_text_repel(aes(label = label),
                           size = 3, show.legend = FALSE) 
0

There are 0 answers