Plotting with gggenomes (ggplot for genes) : change the color of labels

118 views Asked by At

Dear Stackoverflowers,

I am currently coding an application that displays sequences and genes of 11 species of nematodes (such as C.elegans).

I am using R shiny combined with gggenomes package which you can think of like ggplot but to display alignments between genes through several sequences.

gggenomes takes three data frames to work : seqs, genes and links.

Inside the data frames, you can find the columns below :

-genes : seq_id, start, end, length, orthogroup

-seqs :seq_id, start, end, length

-links : seq_id, start, end, seq_id2, start2, seq_id2

Here is an example :

 p <- gggenomes(seqs = seqs, genes = genes, links = links) +
          geom_seq() +
          geom_gene(aes(fill = Orthogroup), stroke = 0.5) +
          geom_bin_label(fontface = "italic", size = 5, expand_left = 0.8) +
          geom_link(offset = 0.25)+
          theme(axis.text.x=element_text(size=15))+
          labs(fill = "Orthogroups")

Since it works like ggplot, it also uses geoms and aesthetic (aes).

Last info you need to know :

geom_bin_label is a geom that takes the seq_id column from seqs data frame and plot the sequence name at the left of each sequences.

Here is a plot generates with gggenomes using geom_bin_label : Synteny plot using gggenomes

So, on the plot, the seq_ids are constructed like this : "species_name sequence_name".

Example : "bovis CBOVI.ctg00005_chrIV"

WHAT I WANT TO DO

  1. Align the species_name and align the sequence_name so they form two nice columns on the plot.

Such as :

bovis                 CBOVI.ctg00005_chrIV    sequences here...
becei             CSP29.scaffold174_cov172    ...
panamensis         CSP28.scaffold107_cov92    ...
inopinata                        SP34_chr4    ...
elegans                                 IV    ...
tropicalis                     Scaffold629    ...
remanei                                 IV    ...
latens                         scaffold_77    ...
tribulationis          CSP40_scaffold02881    ...
briggsae                                IV    ...
nigoni                          CM008512.1    ...

Reminder : in the seq_id column the seq_ids are written like this : "species sequence".

  1. Color in red the species_name and in blue the sequence_name (the colors are random, I just want to display them in different colors)

I hope you would be able to help me. It seems like an easy problem of displaying but it's actually quite tricky.

I let here some links that could help you :

https://thackl.github.io/gggenomes/reference/index.html

https://thackl.github.io/gggenomes/reference/geom_bin_label.html

https://ggplot2.tidyverse.org/

0

There are 0 answers