Im a biologist, not a programmer so please be gentle.
So I have a dataset that looks like
Genes Patient1 Patient2 Patient3
A 324 433 343
B 431 342 124
Z 232 234 267
then I have the sample sheet where it contains sample info like:
Patient1 - Healthy
Patient2 - Disease
Patient3 - Healthy
I am using:
library(ggfortify)
df <- dataset
pca_res <- prcomp(df, scale. = TRUE)
autoplot(pca_res)
Then I want to do
autoplot(pca_res, data = ?, colour = '?')
I wish to use the info from the sample sheet to color my PCA based on the state (healthy/disease) using the autoplot function. Is there a way to do this?
First, I would create a complete data.frame with all information available.
For example, you will need to create this kind of data.frame :
After, you could use the
factoextra
package that is very handy for plotting PCA :You can check the
fviz_pca_ind
documentation to modify the color thereafterEdit :
To create the whole dataframe from your 2 datasets :
1)Take your first dataframe and put the first column as rownames
2)Formatting your second dataframe You should format it to havethe same columns as df (Patient1, Patient2,...) with for each one the disease status, that you will call df2
We don't know your data so you have to perform this by your own
3)Then you rbind df and df2
and then your perform PCA with df3