I have a sparse matrix of dimension 33694*10000, which I had converted into data frame and then coerced into a matrix. I am using DESCEND and SOUP (semisoft clustering of single-cell data) to filter out highly variable genes from the matrix and in turn, perform clustering.
The function "selectGenes" from SOUP is taking too long to finish however earlier it took two days to finish the same task (different matrix but same dimensions). I have checked my matrix for any missing values or NAs but the matrix seems fine. Or maybe I need to look for some other kind of error in my matrix because the test data is running fine and taking the usual time to finish the run.
I don't know how to troubleshoot this because I cannot see any error my screen just looks like this.
select.out_SP1_mat = selectGenes(SP1_mat_transposed, type="count", n.cores=25)
> select.out_SP1_mat = selectGenes(SP1_mat_transposed, type="count", n.cores=25)
Removed 32820 genes that are expressed in less than 10 cells.
Selecting from remaining 874 genes...
SPCA selection...
DESCEND selection...
[1] "DESCEND starts deconvolving distribution of 874 genes!"
[1] "Estimating the time to finish this task ..."
Here is the link to SOUP tutorial pdf and the code for function "selectGenes":
https://github.com/lingxuez/SOUPR/blob/master/vignettes/SOUP-vignette.pdf
https://rdrr.io/github/lingxuez/SOUP/src/R/geneSelect.R
https://rdrr.io/github/jingshuw/descend/src/R/descend.R
Can anyone suggest how do I troubleshoot or solve this issue? I am running this on cluster having 64 cores and 1 Tb RAM.
This is what I did before running the function mentioned:
library(SOUP)
library(ggplot2)
> SP1 <-read.table(file = "/home/mverma/SINGLE_CELL_data/spleen/raw_gene_bc_matrices/spleen_subset_test/analysis/SP1_matrix.tsv", sep = '\t', header = TRUE)
> SP1_mat <- as.matrix(x = SP1)
> dim(SP1_mat)
[1] 33694 10000
> SP1_mat_transposed <- t(SP1_mat)
> dim(SP1_mat_transposed)
[1] 10000 33694
> log.expr_SP1_mat= log2(scaleRowSums(SP1_mat_transposed)*(10^6) + 1)
> dim(log.expr_SP1_mat)
[1] 10000 33694
> select.out_SP1_mat = selectGenes(SP1_mat_transposed, type="count", n.cores=25)
Removed 32820 genes that are expressed in less than 10 cells.
Selecting from remaining 874 genes...
SPCA selection...
DESCEND selection...
[1] "DESCEND starts deconvolving distribution of 874 genes!"
[1] "Estimating the time to finish this task ..."