Duff device in PHP not possible?

656 views Asked by At

I've been told that a duff device doesn't work with PHP, because the switch and case construct working different. I've found this duff devive on php.net, my question is what is wrong with this device? Or didn't I understand a duff device? In my assembler I can unroll a loop with a simple command and when it compiles I get an unrolled loop.

<?php
$n = $ITERATIONS % 8;
while ($n--) $val++;
$n = (int)($ITERATIONS / 8);
while ($n--) {
   $val++;
   $val++;
   $val++;
   $val++;
   $val++;
   $val++;
   $val++;
   $val++;
}
?>
2

There are 2 answers

1
Mankarse On BEST ANSWER

That is not a Duff's Device. It uses a special pre loop alignment step (which is precisely what Duff's Device is designed to avoid).

In a true Duff's Device there is a single section of unrolled code which is initially partially skipped over by a switch. This trick reduces the required amount of code (to just the loop) and reduces the number of conditional jumps in the code.

The code that you presented is simply a manually unrolled loop.


Loop unrolling:

Loop unrolling is an optimisation technique in which several iterations of a loop are processed at once. So instead of:

$number_of_iterations = 128;
for ($n = 0; $n !== $number_of_iterations; ++$n) {
    do_something();
}

You use:

$number_of_iterations = 128;
for ($n = 0; $n !== (int)($number_of_iterations / 4); ++$n) {
    //Repeat do_something() four times.
    //Four is the "unrolling factor".
    do_something();
    do_something();
    do_something();
    do_something();
}

The advantage of this is speed. Conditional branching is typically a relatively expensive operation. Compared to the unrolled loop, the first loop will pass over the conditional branch four times more often.

Unfortunately, this approach is somewhat problematic. Suppose $number_of_iterations was not divisible by four - the division of labour into larger chunks would no longer work. The traditional solution to this is to have another loop which performs the work in smaller chunks until the remaining amount of work can be performed by an unrolled loop:

$number_of_iterations = 130;
//Reduce the required number of iterations
//down to a value that is divisible by 4
while ($number_of_iterations % 4 !== 0) {
    do_something();
    --$number_of_iterations
}
//Now perform the rest of the iterations in an optimised (unrolled) loop.
for ($n = 0; $n !== (int)($number_of_iterations / 4); ++$n) {
    do_something();
    do_something();
    do_something();
    do_something();
}

This is better, but the initial loop is still needlessly inefficient. It again is branching at every iteration - an expensive proposition. In php, this is as good as you can get (??).

Now enter Duff's Device.

Duffs Device:

Instead of performing a tight loop before entering the efficient unrolled zone, another alternative is to go straight to the unrolled zone, but to initially jump to part way through the loop. This is called Duff's Device.

I will now switch the language to C, but the structure of the code will remain very similar:

//Note that number_of_iterations
//must be greater than 0 for the following code to work
int number_of_iterations = 130;
//Integer division truncates fractional parts
//counter will have the value which corresponds to the
//number of times that the body of the `do-while`
//will be entered.
int counter = (number_of_iterations + 3) / 4;
switch (number_of_iterations % 4) {
    case 0: do { do_something();
    case 3:      do_something();
    case 2:      do_something();
    case 1:      do_something();
            while (--counter > 0)
}

All of the conditional branches in the while ($number_of_iterations % 4 !== 0) from earlier have been replaced by a single computed jump (from the switch).


This whole analysis is predicated on the flawed notions that reducing the number of conditional branches in a region of code will always result in significantly better performance and that the compiler will not be able to perform these sorts of micro-optimisations by itself where appropriate. Both manual loop unrolling and Duff's Device should be avoided in modern code.

8
Gustav Bertram On

Your code is not actually a Duff's Device. A proper DD would have a while or do/while that is interlaced in a switch statement.

The point of a DD is to remove this bit of your code:

$n = $ITERATIONS % 8;
while ($n--) $val++;

The first step of the Duff Device is handled like a GOTO into the code:

send(to, from, count)
register short *to, *from;
register count;
{
        register n = (count + 7) / 8;
        switch(count % 8) {
        case 0:      do {     *to = *from++;
        case 7:              *to = *from++;
        case 6:              *to = *from++;
        case 5:              *to = *from++;
        case 4:              *to = *from++;
        case 3:              *to = *from++;
        case 2:              *to = *from++;
        case 1:              *to = *from++;
                } while(--n > 0);
        }
}

Say count % 8 turns out to be 5. That means the switch jumps to case 5, and then just falls through to the end of the while, at which point it starts doing the work in increments of 8.