Intell TBB Performance

824 views Asked by At

I took a TBB matrix multiplication from here

This example uses the concept of blocked_range for parallel_for loops. I also ran a couple of programs using Intel MKL and eigen libraries. When I compare the times taken by these implementations, MKL is the fastest, while TBB is the slowest (10 times slower than eigen on an average) for a variety of matrix sizes (2-4096). Is it normal or am I doing something wrong ? Shouldn't TBB performing better than eigen at least ?

2

There are 2 answers

0
Olivier On

That looks like a really basic matrix multiplication algorithm, meant as little more than an example on how to use TBB. There are far better ones and I'm fairly certain the intel MKL will be using SSE / AVX / FMA instructions too.

To put it another way, there wouldn't be any point to the Intel MKL if you could replicate its performance with 20 lines of code. So yes, what you get seems normal.

At the very least, with large matrices, the algorithm needs to take cache and other details of the memory subsystem into account.

0
eerorika On

Is it normal

Yes, it is normal for one program to be slower than another by a factor of 10.

Shouldn't TBB performing better than eigen at least ?

I don't see any reason why a naïve implementation of matrix multiplication using TBB would perform better, or even close to the performance of a dedicated, optimized library designed for fast linear algebra.