CLAPACK f2c vs MKL : Matrix multiplication performance issue

211 views Asked by seb Lucass At 31 December 2024 at 01:02

I am looking for a solution to accelerate the performances of my program with a lot of matrix multiplications. So I hace replaced the CLAPACK f2c libraries with the MKL. Unfortunately, the performances results was not the expected ones.

After investigation, I faced to a block triangular matrix which gives bad performances principaly when i try to multiply it with its transpose.

In order to simplify the problem I did my tests with an identity matrix of 5000 elements ( I found the same comportment )

NAME	Matrix [Size,Size]	CLAPACK f2c(second)	MKL_GNU_THREAD (second)
Multiplication of an identity matrix by itself	5000	0.076536	1.090167
Multiplication of dense matrix by its transpose	5000*5000	93.71569	1.113872

We can see that the CLAPACK f2c multiplication of an identity matrix is faster ( x14) than the MKL.
We can note an acceleration multipliy by 84 between the MKL and CLAPACK f2c dense matrix multiplication.

Moreover, the difference of the time consumption during the muliplication of a dense*denseT and an identity matrix is very slim.

So I tried to found in CLAPACK f2c DGEMM where is the optimization for the multiplication of a parse matrix, and I found a condition on null values.

/*           Form  C := alpha*A*B + beta*C. */


           i__1 = *n;

           for (j = 1; j <= i__1; ++j) {

             if (*beta == 0.) {

                 i__2 = *m;

                 for (i__ = 1; i__ <= i__2; ++i__) {

                    c__[i__ + j * c_dim1] = 0.;

/* L50: */

                 }

             } else if (*beta != 1.) {

                 i__2 = *m;

                 for (i__ = 1; i__ <= i__2; ++i__) {

                    c__[i__ + j * c_dim1] = *beta * c__[i__ + j * c_dim1];

/* L60: */

                 }

             }

             i__2 = *k;

             for (l = 1; l <= i__2; ++l) {

                 if (b[l + j * b_dim1] != 0.) { // HERE THE CONDITION

                    temp = *alpha * b[l + j * b_dim1];

                    i__3 = *m;

                    for (i__ = 1; i__ <= i__3; ++i__) {

                        c__[i__ + j * c_dim1] += temp * a[i__ + l *

                              a_dim1];

/* L70: */

                    } // ENF of condition

                 }

When I removed this condition I got this kind of results :

NAME	Matrix [Size,Size]	CLAPACK f2c (second)	MKL_GNU_THREAD (second)
Multiplication of an identity matrix by itself	5000	93.210873	1.090167
Multiplication of dense matrix by its transpose	5000*5000	93.71569	1.113872

Here we note that the multiplication of a dense and an identity is very clause in term of performances, and now the MKL shows the best performances.
The MKL multiplication seems to be faster than CLAPACK f2c but only with the same number of non-null elements.

I have two ideas on this results :

The 0 optimization is not activated by default in the MKL
The MKL cannot see the 0 (double) values inside my sparse matrices .

May you tell me why the MKL shows performance issues ? Do you have any tips in order to bypass the multiplication on null elements with dgemm ?

I did a conservion in CSR and it shows better performances but in is case why lapacke_dgemm is worst than f2c_dgemmm.

Thank you for your help :)

MKL_VERBOSE Intel(R) MKL 2021.0 Update 1 Product build 20201104 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.50GHz lp64 gnu_thread

Original Q&A

TechQA.

CLAPACK f2c vs MKL : Matrix multiplication performance issue

There are 0 answers

Related Questions in MATRIX

Related Questions in HPC

Related Questions in INTEL-MKL

Related Questions in LAPACKE

Related Questions in F2C

Popular Questions

Popular Tags

Trending Questions