Add sample info to dataset in PCA (R)

105 views Asked by At

Im a biologist, not a programmer so please be gentle.

So I have a dataset that looks like

Genes  Patient1   Patient2   Patient3
A          324      433         343
B          431       342        124
Z          232       234        267

then I have the sample sheet where it contains sample info like:

Patient1 - Healthy
Patient2 - Disease
Patient3 - Healthy

I am using:

library(ggfortify)
df <- dataset
pca_res <- prcomp(df, scale. = TRUE)

autoplot(pca_res)

Then I want to do

autoplot(pca_res, data = ?, colour = '?')

I wish to use the info from the sample sheet to color my PCA based on the state (healthy/disease) using the autoplot function. Is there a way to do this?

1

There are 1 answers

2
Basti On

First, I would create a complete data.frame with all information available.

For example, you will need to create this kind of data.frame :

df=structure(list(A = c(324, 433, 343), B = c(431, 342, 124), Z = c(232, 
234, 267), Status = c("Healthy", "Disease", "Healthy")), row.names = c("Patient1", 
"Patient2", "Patient3"), class = "data.frame")

After, you could use the factoextra package that is very handy for plotting PCA :

pca_res <- prcomp(df, scale. = TRUE)
library(factoextra)
fviz_pca_ind(pca_res, habillage=df$Status)

You can check the fviz_pca_ind documentation to modify the color thereafter

Edit :

To create the whole dataframe from your 2 datasets :

1)Take your first dataframe and put the first column as rownames

rownames(df)=df$Genes
df=df[,-1] #remove the gene column in order to keep only the values

2)Formatting your second dataframe You should format it to havethe same columns as df (Patient1, Patient2,...) with for each one the disease status, that you will call df2

df2
rownames(df2)=c("Status")

Patient1   Patient2   Patient3
Healthy   Disease   Healthy

We don't know your data so you have to perform this by your own

3)Then you rbind df and df2

df3=rbind(df,df2)
df3=data.frame(t$df)

and then your perform PCA with df3