CLAPACK f2c vs MKL : Matrix multiplication performance issue

153 views Asked by At

I am looking for a solution to accelerate the performances of my program with a lot of matrix multiplications. So I hace replaced the CLAPACK f2c libraries with the MKL. Unfortunately, the performances results was not the expected ones.

After investigation, I faced to a block triangular matrix which gives bad performances principaly when i try to multiply it with its transpose.

In order to simplify the problem I did my tests with an identity matrix of 5000 elements ( I found the same comportment )

NAME Matrix [Size,Size] CLAPACK f2c(second) MKL_GNU_THREAD (second)
Multiplication of an identity matrix by itself 5000 0.076536 1.090167
Multiplication of dense matrix by its transpose 5000*5000 93.71569 1.113872
  • We can see that the CLAPACK f2c multiplication of an identity matrix is faster ( x14) than the MKL.
  • We can note an acceleration multipliy by 84 between the MKL and CLAPACK f2c dense matrix multiplication.

Moreover, the difference of the time consumption during the muliplication of a dense*denseT and an identity matrix is very slim.

So I tried to found in CLAPACK f2c DGEMM where is the optimization for the multiplication of a parse matrix, and I found a condition on null values.

/*           Form  C := alpha*A*B + beta*C. */


           i__1 = *n;

           for (j = 1; j <= i__1; ++j) {

             if (*beta == 0.) {

                 i__2 = *m;

                 for (i__ = 1; i__ <= i__2; ++i__) {

                    c__[i__ + j * c_dim1] = 0.;

/* L50: */

                 }

             } else if (*beta != 1.) {

                 i__2 = *m;

                 for (i__ = 1; i__ <= i__2; ++i__) {

                    c__[i__ + j * c_dim1] = *beta * c__[i__ + j * c_dim1];

/* L60: */

                 }

             }

             i__2 = *k;

             for (l = 1; l <= i__2; ++l) {

                 if (b[l + j * b_dim1] != 0.) { // HERE THE CONDITION

                    temp = *alpha * b[l + j * b_dim1];

                    i__3 = *m;

                    for (i__ = 1; i__ <= i__3; ++i__) {

                        c__[i__ + j * c_dim1] += temp * a[i__ + l *

                              a_dim1];

/* L70: */

                    } // ENF of condition

                 }

When I removed this condition I got this kind of results :

NAME Matrix [Size,Size] CLAPACK f2c (second) MKL_GNU_THREAD (second)
Multiplication of an identity matrix by itself 5000 93.210873 1.090167
Multiplication of dense matrix by its transpose 5000*5000 93.71569 1.113872
  • Here we note that the multiplication of a dense and an identity is very clause in term of performances, and now the MKL shows the best performances.
  • The MKL multiplication seems to be faster than CLAPACK f2c but only with the same number of non-null elements.

I have two ideas on this results :

  1. The 0 optimization is not activated by default in the MKL

  2. The MKL cannot see the 0 (double) values inside my sparse matrices .

May you tell me why the MKL shows performance issues ? Do you have any tips in order to bypass the multiplication on null elements with dgemm ?

I did a conservion in CSR and it shows better performances but in is case why lapacke_dgemm is worst than f2c_dgemmm.

Thank you for your help :)

MKL_VERBOSE Intel(R) MKL 2021.0 Update 1 Product build 20201104 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.50GHz lp64 gnu_thread

0

There are 0 answers