Understanding FMA instructions performance

Question

Understanding FMA instructions performance

2.3k views Asked by Peter L. At 07 January 2017 at 23:53

i'm tring to understand how can i max out the number of operations i can get on my CPU. I'm doing a simple matrix multiplication program, and i have a Skylake processor. I was looking at the wikipedia page for the flops information on this architecture, and i'm having dificulties understanding it.

From my understanding, FMA instructions allow 3 way FP inputs right? And allow to mix between adds and multiplies between them. But what happens when i only add two floats? Does it simply multiply it by one? Can i add 3 floats in 1 cycle, or will that be split? I saw that the skylake, has 32 FLOPs/cycle for single precision inputs, but what's the meaning of "two 8-wide FMA instructions"?

Thank you in advance for the explanations

Original Q&A

There are 1 answers

**gnasher729** · Accepted Answer · 2017-01-08T00:16:40+00:00

FMA calculates ± a*b ± c in a single operation, with a single rounding error. That's what it does, nothing else. Calculating a + b + c cannot be done using an FMA instruction; you need two dependent ADD operations for that.

Depending on the compiler, you may have to turn a compiler option to allow use of FMA instructions, because they don't give results identical to multiply followed by add. And you may have to re-arrange your code in some cases, for example ab + cd + e will be calculated as x = ab; y = FMA (c, d, x), z = y + e but e + ab + c*d will be calculated as x = FMA (a, b, e); z = FMA (c, d, x). The basic operation calculation of an FFT can be performed with eight floating-point operations and can be rewritten as 10 operations using four FMAs and two other operations.

"Two 8-wide FMA instructions" means it can perform FMA instructions with two 256 bit vector registers containing 8 floats each, and two of these in the same cycle.

TechQA.

Understanding FMA instructions performance

There are 1 answers

Related Questions in FLOATING-POINT

Related Questions in CPU-ARCHITECTURE

Related Questions in INSTRUCTION-SET

Related Questions in FLOPS

Related Questions in FMA

Popular Questions

Trending Questions