How can I find the mean and average expression in my violin plot using R?

473 views Asked by At

I have plotted my violin plots, but I am trying to find the mean and average expression at each of my time points.

Here is the code that I have used:

VlnPlot(spinalcord_neurons, features = c("Cx3cl1"), group.by= "time", pt.size = .5,) + theme (axis.text.x=element_text(angle=40, hjust=1)) + theme(legend.position = "right")

And here is the graph that I got:

enter image description here

I am new to using R and not familiar with all of the functions. Thank you for your help!

1

There are 1 answers

0
knitz3 On BEST ANSWER

We can use some example data from the SeuratData package. Generally for comparing expression between classes, we use the RNA assay (or if you have used SCTransform on your data, the SCT assay)

Seurat makes it easy to generate results, but it can be tricky to make sure results are valid. Sorry for the long-winded answer.

I assume by "mean and average expression" you want to get the mean expression of either your counts or normalized data (which may be in the data slot).

# Load library
library(Seurat)

# Load example data
library(SeuratData)
data("ifnb")
ifnb <- UpdateSeuratObject(ifnb)
# We have ident "stim" suitable for comparisons
colnames([email protected])
levels(factor([email protected]$stim))
[1] "orig.ident"         "nCount_RNA"        
[3] "nFeature_RNA"       "stim"              
[5] "seurat_annotations"

[1] "CTRL" "STIM"
# VlnPlot using RNA assay
VlnPlot(ifnb, features = "CXCL10", group.by = "stim", assay = "RNA")

# Mean expression, returns values for each assay in your object
AverageExpression(ifnb, group.by = "stim", features = "CXCL10")
$RNA
             CTRL          STIM
[1,] 1.162095e+63 4.917149e+190

This gave us these crazy high numbers, but from the documentation for AverageExpression():

If slot is set to 'data', this function assumes that the data has been log normalized and therefore feature values are exponentiated prior to averaging so that averaging is done in non-log space. Otherwise, if slot is set to either 'counts' or 'scale.data', no exponentiation is performed prior to averaging

This example dataset is not an integrated object with any sort of workflow done to normalize the data, so the data slot of the RNA assay is currently just holding the counts, same as the counts slot. We can either return average expression of the counts, or ensure that the data slot is log-normalized. If it isn't already, you will want to ensure that your dataset is properly normalized, and integrated if need be.

These vignettes from the Seurat website are extremely helpful and practically required reading if you want to get acquainted with single cell analysis.

# Mean expression of the counts, with sane numbers
AverageExpression(ifnb, group.by = "stim", features = "CXCL10", slot = "counts")


# Mean expression, but we are acting on the `data` slot after normalization and
# log transformation. The default behavior of AverageExpression() exponentiates
# the log-transformed expression first before calculating means

# Since we have normalized the data in addition to log-transforming,
# the results are not exactly the same
ifnb_log <- NormalizeData(ifnb, normalization.method = "LogNormalize")
AverageExpression(ifnb_log, group.by = "stim", features = "CXCL10")
$RNA
          CTRL     STIM
[1,] 0.2965791 30.17246


Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
$RNA
         CTRL     STIM
[1,] 1.141298 104.3836