PCA plot on repeats grouped by condition

Question

PCA plot on repeats grouped by condition

129 views Asked by itskerry17 At 21 November 2023 at 12:40

I'm trying to produce a PCA plot on expression data produced from 3 conditions with 3 repeats each. I've managed to get a plot but then am struggling to colour and group the conditions as I think I may have laid the data out wrong.

And then I've run the following code by am stuck when I get to colouring each sample. I want to colour by O, FO and F containing each of the 3 repeats and then ellipse these 3 conditions. Any help would be appreciated.

The table:

structure(list(Gene_ID = c("gene-EHS42_RS00005", "gene-EHS42_RS00010", 
"gene-EHS42_RS00015", "gene-EHS42_RS00020", "gene-EHS42_RS00025", 
"gene-EHS42_RS00030", "gene-EHS42_RS00035", "gene-EHS42_RS00040", 
"gene-EHS42_RS00045", "gene-EHS42_RS00050", "gene-EHS42_RS00055", 
"gene-EHS42_RS00060", "gene-EHS42_RS00065", "gene-EHS42_RS00070", 
"gene-EHS42_RS00075", "gene-EHS42_RS00080"), O1 = c(757.784, 
896.264, 123.429, 985.022, 85.8583, 111.718, 10.7002, 152.577, 
17.7682, 1086.55, 2826.57, 109.637, 43.1502, 0, 3158.45, 2271.19
), O2 = c(723, 897.502, 157.31, 1075.96, 106.999, 118.593, 10.8549, 
137.093, 19.2265, 1142.01, 2841.09, 91.1191, 63.1088, 0, 2981.31, 
2136.32), O3 = c(724.17, 875.258, 133.573, 1155.09, 74.4442, 
107.826, 16.365, 164.105, 29.4387, 751.156, 2822.42, 93.7586, 
37.7846, 0, 2978.32, 2045.64), FO1 = c(688.876, 922.35, 135.935, 
1223.9, 119.83, 93.1258, 17.7483, 324.379, 77.5033, 862.804, 
2524.59, 95.5171, 53.9344, 0, 2455.88, 1462.5), FO2 = c(869.985, 
1185.33, 194.729, 882.644, 177.953, 135.183, 21.7251, 296.909, 
58.101, 1247, 2511.67, 114.952, 63.6875, 0, 1433.23, 904.294), 
    FO3 = c(840.392, 1195.88, 165.721, 937.342, 170.775, 145.854, 
    23.9473, 285.05, 44.2553, 1402.51, 2737.45, 100.696, 73.0917, 
    0, 1419.96, 1051.12), F1 = c(1718.91, 1729.51, 341.759, 1324.52, 
    86.4022, 264.029, 30.6917, 169.219, 37.1905, 1987.85, 1370.75, 
    97.2895, 69.3806, 0, 3641.66, 2916.67), F2 = c(1919.41, 1666.16, 
    323.399, 850.732, 67.4236, 271.421, 18.9667, 184.824, 18.0931, 
    1617.57, 1449.76, 86.3241, 48.5885, 0, 2524.14, 1730.51), 
    F3 = c(1951.07, 1850.52, 376.333, 1157.23, 41.8972, 277.754, 
    32.3741, 177.472, 34.1986, 1039.71, 874.081, 78.1316, 58.6108, 
    0, 3424.35, 2758.01)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -16L))

And then the code I ran:

str(PcA_Plot_Data)
head(PcA_Plot_Data)

expression.pca <- prcomp(PcA_Plot_Data[,c(2:10)],
                         centre = TRUE,
                         scale. = TRUE)
summary(expression.pca)

library(ggfortify)
expression.pca.plot <- autoplot(expression.pca,
                                data = PcA_Plot_Data,
                                colour = '')

Original Q&A

There are 1 answers

**ATpoint** · Answer 1 · 2023-11-24T09:01:32+00:00

You're correct that it is convention that genes should rows and columns should be samples. But you're running the PCA on untransposed data, but I assume you want to have each sample as a single dot in te final plot. Here is the minimal version on what to do.

Note that I am not checking whether your data needs normalization or any transformation such as log, it just demonstrates how to do PCA based on such data. It's on you to check how to make them appropriate for such analysis:

# It's kind of convention to have gene expression data as numeric matrix/data.frame without genes as a column
data <- as.data.frame(data)
rownames(data) <- data$Gene_ID
data$Gene_ID <- NULL

# Run PCA on transposed data
pca <- prcomp(t(data))

# Parse group names
groups <- gsub("1|2|3", "", colnames(data))

# Biplot
library(ggplot2)
to_plot <- data.frame(pca$x, group=groups)

ggplot(data=to_plot, aes(x=PC1, y=PC2, color=group)) + geom_point(size=3)

TechQA.

PCA plot on repeats grouped by condition

There are 1 answers

Related Questions in R

Related Questions in GGPLOT2

Related Questions in BIOINFORMATICS

Related Questions in GGFORTIFY

Popular Questions

Popular Tags

Trending Questions