How to make a linear graph plotting gene counts on y axis and gene lengths on x?

32 views Asked by At

sorry for maybe the stupid question but can't wrap my head around this one. I want to create a smooth linear plot or a histogram, for my 8 samples looking at the distribution of length of their gene transcription (x axis), and how frequently this length appears in each of the samples (y).

I've binned the log10 of my gene lengths, but then I'm stuck. I cant plot a histogram as then it just says they're all the same (all the genes appear in all of the samples), and I'm not sure how to include the expression value for the experiments in this.

Any suggestions would be appreciated!

Example of dataframe

          Gene.ID Length ND_R1 ND_R2 NP_R1 NP_R2 dD_R1 dD_R2 dP_R1 dP_R2 log10_length log10_length_bin
1 ENSG00000273901   7999    44    48   122    15    79    61    74   107     3.903036                1
2 ENSG00000165392  23499  1246  1851  1065   106  1755  1787  1291  2169     4.371049                3
3 ENSG00000110172  44999   646   969   945    68  1252  1278  1515  2566     4.653203                4
4 ENSG00000148498   9499    21    33    49     3   135   139   113   202     3.977678                1
5 ENSG00000123473  11499   271   460   381    35   585   560   512   892     4.060660                2
6 ENSG00000081721 229335  4461  6963  6068   467  6211  6198  5674  9733     5.360470                7
df <- mutate(df, log10_length_bin = cut(log10_length, breaks = seq(3.75, 5.5, by = 0.25), labels = FALSE))
df <- filter(df, log10_length >= 3.75 & log10_length <= 5.5)
df_long <- tidyr::pivot_longer(df, 
                               cols = starts_with(c("ND_", "dD_", "NP_", "dP_")), 
                               names_to = "Sample", 
                               values_to = "Expression")

counts <- df_long %>%
  group_by(Sample, log10_length_bin) %>%
  summarise(Count = n(), .groups = "drop")

ggplot(df, aes(x = log10_length)) +
  geom_histogram(binwidth = 0.1, aes(fill = ND_R1), position = "dodge") +
  labs(x = "Log10 of DoG Length", y = "Frequency", title = "Distribution of Log10 DoG Lengths") +
  scale_fill_discrete(name = "Sample") +
  theme_minimal()
0

There are 0 answers