How to calculate the total number of FOP and floating-point performance of special operations(exp sin sqrt)?

597 views Asked by At

When measuring an algorithm, if there are division operations, how to calculate the total number of FOP and floating-point performance?

For example, n2 matrix multiplication, the calculation of n3 * 2flops (a multiplication, an addition), assuming that using the same data set n2, we change multiplication operations of the matrix multiplication into the division operations, how to calculate flops. Is it the same with the result of matrix multiplication?

1

There are 1 answers

0
Margaret Bloom On

Alas there is not a standard that specifies what a floating-point operation is.
This is due to the fact that different architectures may have native support for a different set of operations.
So for example, architecture A1 may supports all the four basic operations, A2 only the addition and A3 all the basic operations plus exponentiation.

In general the term floating-point operations is highly contextualized and tied to a specific machine.

You can, however, make a good machine independent analysis by counting every kind of operation separately.
This requires a bit of expertise and voodoo, for example addition and subtraction are counted together because they are basically the same operation for the hardware.
Multiplications and divisions are counted separately, like are more complex operations (exponentiation, trigonometric functions and so on).

In the end you'll have a count for all the different operations.
For example multiplying an n × m matrix by an m × k one involves n·k·m multiplications and n·k·(m-1) additions. so the result is n·k·m MUL + n·k·(m-1) ADD.

From this "full-information" expression, which is usually a good result on its own, you can get an approximation of the number of "floating-point operations" by picking up a reference machine and the unit of measure.

For example the Skylake micro-architecture from Intel has this, very simplified timings table:

Operation             Cycles

Addition              0.5
Subtraction           0.5
Division              3
Multiplication        0.5

If we take the addition as the unit of measure for one FLOP, we can say that a division is as long as 6 additions, so it is like 6 FLOPs.

Operation             FLOPs

Addition              1   (By definition)
Subtraction           1
Division              6
Multiplication        1

So the example above reduces to n·k·(2·m - 1) since multiplication and addition all take only 1 FLOP to complete.

This is a simplified view, real machines are much more complicated (for example Skylake has vector units and FMA support that may change the unit of measurement and the timings).
Anyway, the expression in terms of the different kind of operations is machine independent and can be converted into a single number later when making a specific case.