autoplot in ggfortify - only plotting some values for PCA

456 views Asked by At

i'm following this documentation (https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html) to run PCA on the iris data set.

library(ggfortify)
df <- iris[1:4]
pca_res <- prcomp(df, scale. = TRUE)

autoplot(pca_res, data = iris, colour = 'Species')

I ran the above code and I get three clusters which are colored by species. I only want to plot a specific species. How can I only plot where the species is setosa in this context?

1

There are 1 answers

2
stefan On

As the object returned by autoplot is a ggplot object one option would be to manually filter the data passed to the geom_point layer under the hood. In your case this is quite simple as the ggplot has only one layer which we can access via p$layers[[1]] and the data used for this layer via p$layers[[1]]$data.

library(ggfortify)
#> Loading required package: ggplot2

df <- iris[1:4]
pca_res <- prcomp(df, scale. = TRUE)

p <- autoplot(pca_res, data = iris, colour = 'Species')

p$layers[[1]]$data <- p$layers[[1]]$data[p$layers[[1]]$data$Species == "setosa", ]

p

EDIT Another option would be to "remove" the undesired categories by making the invisible. This way the original scale of the axes will be preserved. Of course would it also be possible to use the approach with filtering but additionally setting the scale to the one used for the unfiltered data.

p + 
  scale_color_manual(values = c("versicolor" = "transparent", "virginica" = "transparent", setosa = "red"), breaks = "setosa")