The basic aim of my question is how to achieve the best performance of matrix operations in R
using Matrix
package. In particular I want to parallelize operations (multiplication) and work with sparse matrices using computation on CUDA GPU.
Details
According to the documentation of the Matrix
package in R cran
A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using 'LAPACK' and 'SuiteSparse' libraries.
It seems that thanks to the SuiteSparse
I should be able to perform basic operations on sparse matrices using the GPU (CUDA). In particular the documentation of the SuiteSparse lists the following:
SSMULT and SFMULT: sparse matrix multiplication.
On my Gentoo I have installed suitesparse-4.2.1
along with suitesparseconfig-4.2.1-r1
. Also I have lapack
, scalapack
and blas
. The R sessionInfo()
looks as follows:
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux
Matrix products: default
BLAS: /usr/lib64/blas/reference/libblas.so.0.0.0
LAPACK: /usr/lib64/lapack/reference/liblapack.so.0.0.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Matrix_1.2-10
loaded via a namespace (and not attached):
[1] compiler_3.4.1 grid_3.4.1 lattice_0.20-35
I have also set the environmental variable:
export CHOLMOD_USE_GPU=1
which I found on one forum and potentially should allow GPU usage.
Basically, everything looks as ready to go, however, when I run a simple test:
library(Matrix)
M1<-rsparsematrix(10000,10000,0.01)
M<-M1%*%t(M1)
It seems that GPU are not working, as if R
ignores the suitesparse
features.
I know the questions are quite broad, but:
- Does anyone have idea if
R
should be compiled in a specific, strict way to work withsuitesparse
? - How to make sure that
Matrix
package uses all shared libraries for parallelization and sparse operations (with GPU usage)? - Can anyone confirm that he was able to run matrices operations on CUDA/GPU computations using
Matrix
package?
As far as I looked through the Stack and other forums, this question shouldn't be a duplicate.
Matrix
package contains subset ofSuiteSparse
and this subset is built-in into package. SoMatrix
doesn't use your systemSuiteSparse
(you can easily browseMatrix
source code here).sparse_matrix * sparse_matrix
multiplications are hard to efficiently parallelize - strategies vary a lot depending on the structure of both matrices.mclapply
. I doubt this will help.Eigen
andRcppEigen
and perform SSMULT there. I believe it could be quite faster (but still single threaded).