Matrix operations in R: parallelization, sparse operations, GPU computation

568 views Asked by At

The basic aim of my question is how to achieve the best performance of matrix operations in R using Matrix package. In particular I want to parallelize operations (multiplication) and work with sparse matrices using computation on CUDA GPU.

Details

According to the documentation of the Matrix package in R cran

A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using 'LAPACK' and 'SuiteSparse' libraries.

It seems that thanks to the SuiteSparse I should be able to perform basic operations on sparse matrices using the GPU (CUDA). In particular the documentation of the SuiteSparse lists the following:

SSMULT and SFMULT: sparse matrix multiplication.

On my Gentoo I have installed suitesparse-4.2.1 along with suitesparseconfig-4.2.1-r1. Also I have lapack, scalapack and blas. The R sessionInfo() looks as follows:

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux

Matrix products: default
BLAS: /usr/lib64/blas/reference/libblas.so.0.0.0
LAPACK: /usr/lib64/lapack/reference/liblapack.so.0.0.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Matrix_1.2-10

loaded via a namespace (and not attached):
[1] compiler_3.4.1  grid_3.4.1      lattice_0.20-35

I have also set the environmental variable:

export CHOLMOD_USE_GPU=1

which I found on one forum and potentially should allow GPU usage.

Basically, everything looks as ready to go, however, when I run a simple test:

library(Matrix)
M1<-rsparsematrix(10000,10000,0.01) 
M<-M1%*%t(M1)

It seems that GPU are not working, as if R ignores the suitesparse features.

I know the questions are quite broad, but:

  • Does anyone have idea if R should be compiled in a specific, strict way to work with suitesparse?
  • How to make sure that Matrix package uses all shared libraries for parallelization and sparse operations (with GPU usage)?
  • Can anyone confirm that he was able to run matrices operations on CUDA/GPU computations using Matrix package?

As far as I looked through the Stack and other forums, this question shouldn't be a duplicate.

1

There are 1 answers

2
Dmitriy Selivanov On BEST ANSWER
  1. It is not that easy as you described. Matrix package contains subset of SuiteSparse and this subset is built-in into package. So Matrix doesn't use your system SuiteSparse (you can easily browse Matrix source code here).
  2. sparse_matrix * sparse_matrix multiplications are hard to efficiently parallelize - strategies vary a lot depending on the structure of both matrices.
  3. In many cases computations are memory-bound, not CPU bound
  4. You may have worse performance on GPU compared to CPU due to the memory issues described above + memory access patterns.
  5. According to my knowledge there are couple of libraries which implements multithreaded SSMULT - Intel MKL and librsb, but I haven't heard about R interface.
  6. If matrix is huge you can partition your matrix manually and use standard mclapply. I doubt this will help.
  7. You can try to use Eigen and RcppEigen and perform SSMULT there. I believe it could be quite faster (but still single threaded).
  8. Ultimately I would think about how to reformulate problem and avoid SSMULT