Does OpenBLAS use a fast matrix-multiplication algorithm?

112 views Asked by At

I get 55 Gflops/s when multiplying two 10000 x 10000 matrices (counting 2 * 10000^3 flops for the entire computation). This was done on a single core, on a AMD Epyc 7313 running at 3.0GHz (boost clock turned off). Unless I am mistaken, the peak performance of one core is

2 (fma) * 4 (avx2) * 2 (fma per clock) * 3 = 48 Gflops/s

So does the implementation use something like Strassen's algorithm? According to this paper it should be possible for 10k x 10k, but I do not see mention of Strassen or Winograd in the codebase.

0

There are 0 answers