How can Duff's device code be compiled?

1.9k views Asked by At

I understood why Duff's device is faster than normal loop code which can be unrolled but is not optimized. But I can't understand how the code can be compiled yet.
I guess it's a trick about the switch syntax. But not anymore.

How can do while sentence exist in switch sentence? Very weird.
Is there anyone who can explain this?

Edit: Another question. Why did duff use 8? It could be 16, 65536 or whatever. Because of code size? Is there another reason? For example, cache or pipelining benefits.

5

There are 5 answers

1
Michael Burr On BEST ANSWER

A simple explanation of why Duff's Device compiles is that the syntax of the switch statement isn't particularly specific about the form that the switch statement block might need to take. There are a few restrictions, and a couple of things permitted in the controlled statement that aren't permitted outside a switch (the case and default labels). But other than that, the controlled statement is just any other statement, with the likelihood that there are labels for the switch to target.

Here's the syntax from C99:

switch ( expression ) statement

Beyond the syntax, the standard imposes a few constraints:

  • the controlling expression must have an integer type
  • there are restrictions about where VLAs can occur in the switch statement
  • case label expressions must be integer constant expressions
  • there cannot be duplicate case label expressions or default labels

Other than that, any construct permitted in a statement block should be permitted in the controlled statement (with the addition that case and default labels are OK). Remember that case and default are just labels that the switch jumps to based on the controlling expression and the case label expressions. As Potatoswatter says, switch is just a computed goto. So just like goto can jump into the middle of a loop, so can switch.

Also, I think that the cases where you might see a benefit from Duff's Device are pretty rare today (I think they were rare even in the 1980's). Don't forget that Tom Duff himself said the following in his description:

  • "Disgusting, no?"
  • "I feel a combination of pride and revulsion at this discovery."

Even more than when it was initially described, Duff's Device should be considered more of a curiosity than a tool to use.

0
R.. GitHub STOP HELPING ICE On

While it's true that Duff's Device is outdated for its original purpose, it's still useful for special purposes, like a state machine that normally cycles repeatedly through N states, but sometimes needs to return to the caller and later be resumed at the state where it left off. Putting the switch statement outside the loop and the case labels inside the loop (I would take this as the definition of a "Duff's device") then makes a lot of sense.

With that said, do not use Duff's devices to "optimize by hand". Putting what are effectively "goto labels" all over the place will not help the compiler optimize.

1
AudioBubble On

The "how it works" is simple enough.

Both C and C++ are compiled languages, and normally compiled to the platforms machine code. Machine code has no concept of block structures - all block structures must be translated to a form that uses (in effect) some mix of unconditional and conditional gotos.

The C syntax rules allow the switch statement and loop to be combined in a way which is not a true hierarchical block structure, but which tangles control flow. So long as the compiler can cope with this (which any good compiler should) there is no problem in the underlying machine code. The result will be "spaghetti", but generated machine code that has been through an optimiser is always spaghetti - it's not meant to be human readable, so it's not an issue. The issue here is that the source code is spaghetti too, even though the gotos have been "hidden".

Note - although any good compiler should cope with Duffs device, as others have already commented, that doesn't mean it will cope well enough to optimise it properly - only well enough to generate correct executable code. This is one of these old strange idioms that once had a purpose, but which is more likely now to confuse your compiler and sabotage it's ability to generate efficient code.

EDIT

The following is related to Duffs device, and may help illustrate the basic idea...

switch (count & 1)
{
  case 0 : goto lbl0;
  case 1 : goto lbl1;
}

lbl0:

while (count != 0)
{
  handle_one ();
  count--;
lbl1:
  handle_one ();
  count--;
}

Having case clauses inside the loop is conceptually no different to having goto-target labels inside the loop, as above.

Warning - this is purely for illustration of an idea, and should not be copied in real life code.

0
Potatoswatter On

switch is just a computed goto. So, there are several labels inside the loop, and a switch statement outside the loop. The switch decides which label to go to, and gotos there, inside the loop.

Once execution is inside the loop, it goes on looping until the loop relinquishes control.

It's actually very straightforward… and shouldn't be used unless it is the most straightforward alternative.

Don't believe anyone who tells you it makes anything fast.

I would even say to stop listening to anything they say at all, even if it's your teacher.

As for compilers, they break things down to generic control-flow graphs and don't care about switch vs. if vs. while. They are all if ( … ) goto …; else goto …; to the compiler.

0
Tony Delroy On

If we take the implementation from the Wikipedia article you link...

send(to, from, count)
register short *to, *from;
register count;
{
        register n=(count+7)/8;
        switch(count%8){
        case 0:       do{     *to = *from++;
        case 7:               *to = *from++;
        case 6:               *to = *from++;
        case 5:               *to = *from++;
        case 4:               *to = *from++;
        case 3:               *to = *from++;
        case 2:               *to = *from++;
        case 1:               *to = *from++;
                }while(--n>0);
        }
}

...and replace the "high-level" do / while loop with the assembly-level if / goto that the compiler really reduces it to...

send(to, from, count)
register short *to, *from;
register count;
{
        register n=(count+7)/8;
        switch(count%8){
        case 0:     do_label: *to = *from++;
        case 7:               *to = *from++;
        case 6:               *to = *from++;
        case 5:               *to = *from++;
        case 4:               *to = *from++;
        case 3:               *to = *from++;
        case 2:               *to = *from++;
        case 1:               *to = *from++;
                if (--n>0) goto do_label;
        }
}

...it may help you perceive that - in this usage where the do/while scope didn't introduce any local variables - there's really nothing more to the do-while than a jump back to case 0 that bypasses the switch and hence the need to evaluate count % 8 (% is a pretty expensive operation in the scheme of things).

Hope that helps it click, but may not...? :-)

Why did duff use 8? It could be 16, 65536 or whatever. Because of code size? Is there another reason? For example, cache or pipelining benefits.

Just a case of diminishing returns. Having to do the --n > 0 check and a jump after every 8 data copies isn't a big percentage overhead, but the size of the code (both in source and as compiled code in cache) is still pretty tight. Maybe it'd be 90 or 95% work vs overheads, which was evidently good enough. Further, to illustrate and share the concept with others Tom Duff may have prefered it to be about a typical 80x25 terminal's worth of code rather than a page or 10.