Intell TBB Performance

Question

Intell TBB Performance

825 views Asked by shaibi At 09 January 2017 at 14:07

I took a TBB matrix multiplication from here

This example uses the concept of blocked_range for parallel_for loops. I also ran a couple of programs using Intel MKL and eigen libraries. When I compare the times taken by these implementations, MKL is the fastest, while TBB is the slowest (10 times slower than eigen on an average) for a variety of matrix sizes (2-4096). Is it normal or am I doing something wrong ? Shouldn't TBB performing better than eigen at least ?

Original Q&A

There are 2 answers

**Olivier** · Answer 1 · 2017-01-09T14:21:34+00:00

That looks like a really basic matrix multiplication algorithm, meant as little more than an example on how to use TBB. There are far better ones and I'm fairly certain the intel MKL will be using SSE / AVX / FMA instructions too.

To put it another way, there wouldn't be any point to the Intel MKL if you could replicate its performance with 20 lines of code. So yes, what you get seems normal.

At the very least, with large matrices, the algorithm needs to take cache and other details of the memory subsystem into account.

**eerorika** · Answer 2 · 2017-01-09T14:23:24+00:00

Is it normal

Yes, it is normal for one program to be slower than another by a factor of 10.

Shouldn't TBB performing better than eigen at least ?

I don't see any reason why a naïve implementation of matrix multiplication using TBB would perform better, or even close to the performance of a dedicated, optimized library designed for fast linear algebra.

TechQA.

Intell TBB Performance

There are 2 answers

Related Questions in C++

Related Questions in PERFORMANCE

Related Questions in C++11

Related Questions in INTEL

Related Questions in TBB

Popular Questions

Popular Tags

Trending Questions