How do I add symbols in a PCA biplot using ggplot2?

56 views Asked by At

I have a dataset with presence-absence (1-0) data and concentrations of (13) heavy metals from 70 ponds and I'm trying to asses which heavy metals affect newt presence in different ponds so I made a PCA biplot. I want to visualise in which ponds newts are present and absent so that I can see which heavy metals are more linked to presence or absence by adding symbols (e.g. circles, triangles, ...) that correspond with the presence or absence of the newts in ponds. I'm working in Rstudio. You will really help me out! Thank you in advance for your answers!

I used following code to create the PCA biplot, but did not find how to add symbols in this.

# Putting heavy metals for PCA in new vector
heavymetals<-cbind(Newts[,24:38], Newts[,40:46])
heavymetals

# Creating biplot
cor(heavymetals, method = 'spearman')
library(vegan)
heavymetals_model<-rda(heavymetals,scale=TRUE)
biplot(heavymetals_model)

summary(heavymetals_model)
screeplot(heavymetals_model)

Next I tried to make a PCA biplot using my limited knowledge of ggplot2 and asked AI for some help but it didn't get me anywhere...

2

There are 2 answers

1
Seth On

Without the Newts data or PCA object, we'll start with creating an example.

Load packages

library(palmerpenguins) # Data for this example
library(ggplot2)
library(dplyr)
library(tidyr)
library(broom)
library(ggrepel)

Principal Component Analysis on Palmer Penguins data

glimpse(penguins)
#> Rows: 344
#> Columns: 8
#> $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
#> $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
#> $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
#> $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
#> $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
#> $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
#> $ sex               <fct> male, female, female, NA, female, male, female, male…
#> $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

peng_pca <- prcomp(~ bill_length_mm + bill_depth_mm + flipper_length_mm + body_mass_g, data = penguins, scale. = TRUE)

peng_pca
#> Standard deviations (1, .., p=4):
#> [1] 1.6594442 0.8789293 0.6043475 0.3293816
#> 
#> Rotation (n x k) = (4 x 4):
#>                          PC1          PC2        PC3        PC4
#> bill_length_mm     0.4552503 -0.597031143 -0.6443012  0.1455231
#> bill_depth_mm     -0.4003347 -0.797766572  0.4184272 -0.1679860
#> flipper_length_mm  0.5760133 -0.002282201  0.2320840 -0.7837987
#> body_mass_g        0.5483502 -0.084362920  0.5966001  0.5798821

Create a scatterplot of first two PC

peng_pca %>%
  augment(newdata = penguins) %>%
  ggplot(aes(x = .fittedPC1,
             y = .fittedPC2,
             color = species)) +
  geom_point() +
  theme_bw() +
  coord_equal()
#> Warning: Removed 2 rows containing missing values (`geom_point()`).

Create a plot of the loadings

tidy(peng_pca, 2, matrix = 'v') %>%
  mutate(PC = paste0('PC', PC)) %>%
  pivot_wider(names_from = PC, values_from = value) %>%
  ggplot() +
  geom_segment(aes(x = 0, y = 0, xend = PC1, yend = PC2)) +
  geom_label_repel(aes(x = PC1, y = PC2, label = column)) +
  coord_equal() +
  theme_bw()

Created on 2023-12-12 with reprex v2.0.2

0
Seppe On

After some additional searches I found something to answer my question. To visualise in which pools newts are present or absent in my PCA biplot I used following code.

# Conducting a PCA using the prcomp()function instead of the rad()function
heavymetals_pca<-prcomp(heavymetals,scale. = TRUE)

# Code for the actual plot
library(ggfortify) 
autoplot(heavymetals_pca, data = Newts, colour = 'Newts_present',
         loadings = TRUE, loadings.colour = 'blue',
         loadings.label = TRUE, loadings.label.size = 3)

I hope it can also help some of you who have the same problem as I had. Result: PCA biplot with clear presence absence visualised