IOCCC 1986/wall.c - why does TCC beat GCC in handling earlier C code?

561 views Asked by At

Another pearl from the early IOCCC years is Larry Wall's 1986 entry:

http://www.ioccc.org/years.html#1986 (wall)

I suspect there is no C compiler today that can actually compile that source straight-out-of-the-box, due to the severe preprocessor abuse it contains:

  • Latest TDM-GCC 9.2.0 set to ANSI mode fails
  • Last TCC 0.9.27 fails

However, after wrenching the preprocessed code out of the obfuscated original (always with GCC's cpp -traditional), both TCC and GCC manage to compile it; nevertheless, GCC's effort goes to waste as the program chokes when trying to begin decoding its obfuscated intro text (not going to spoil that here for those who want to delve themselves!)

TCC, on the other hand, manages to summarily warn about implicit declarations of system(), read() and write() and quickly produce a working program.

I tried to step through the GCC code execution with GDB and that's how I found that the compiled GCC code chokes on the second pass of a for loop that goes through a text string in order to decode it:

[Inferior 1 (process 9460) exited with code 030000000005]

That process ID doesn't matter as it represents the debug build executable that crashes. The exit code, however, stays the same.

Clearly, TCC is better suited to such IOCCC entries that GCC. The latter still manages to successfully compile and even run some entries, but for tough cases like this one, TCC is hard to beat. Its only drawback is that it falls short when preprocessing extremely abusive code such as this example. It leaves spaces between certain preprocessed entries and thus fails in concatenating them into the author's intended C keywords in places, whereas GCC's cpp works 100%.

My question is, as philosophical or even rhetorical as it sounds:

What is it in modern GCC that makes it either fail to compile, or produce unusable code when it does compile, earlier C programs, unlike TCC?

Thanks in advance for all the feedback, I appreciate it!

NOTE: I am using Windows 10 version 2004 with WSL 2; GCC fails in both the Windows and the WSL 2 environments. I am planning to compile TCC in WSL 2 for comparisons in that environment too.

PS: I immensely enjoyed this program when it finally executed as intended. It undoubtedly deserves that year's "grand prize in most well-rounded in confusion"!

2

There are 2 answers

0
Antti Haapala -- Слава Україні On BEST ANSWER

What is it in modern GCC that makes it either fail to compile, or produce unusable code when it does compile, earlier C programs, unlike TCC?

Undefined behaviour. Which was more of a rule. Just look at this classic 1984 entry.


The C compilers nowadays compile C as set forth in the ISO 9899 standard, whose first revision was published in 1990 (or 1989). The program predates that. Notably, it uses some really odd traditional preprocessor syntax that is invalid in C89, C99, C11 and so forth.

The idea generally is that you do not want to allow this syntax by default because a traditional preprocessor would not produce code compatible with modern preprocessor - for example a traditional preprocessor would replace macros within strings too:

#define greeting(thing) puts("Hello thing")
main() {
    greeting(world!!!);
}

preprocesses to

main() {
    puts("Hello world!!!");
}

The program is valid C89, although bad style; but it would preprocess to

main() {
    puts("Hello thing");
}

So it is best to error out at any sign of non-standard preprocessor usage, otherwise the code could be subtly broken because such substitutions would not have been made.


Another thing writeable strings. The deobfuscation code directly attempts to modify the string literals. C89 specified that this had undefined behaviour - these cause a crash because they're mapped in read-only pages in GCC-compiled program. Older GCC versions supported -fwriteable-strings but it was deprecated a looooong time ago, because it was buggy anyway.


I got the program running by these minimal changes with GCC 9.3.0. -traditional is no longer supported with compilation, so you must preprocess first and compile after that:

gcc -traditional -E wall.c > wall_preprocessed.c

perl -pi -e '/^[^#]/ && s/(".*?")/(char[]){$1}/g'  wall_preprocessed.c
# thanks Larry ;)

gcc wall_preprocessed.c

I.e. I wrapped every thing that looks like a string literal "..." that's not within a compiler line directive (a line starting with #) into a (char[]){"..."} array compound literal - as is known, compound literals have scoped storage duration and non-const qualified ones are writable.

0
zwol On

The crash is caused by the program writing to the contents of a string literal. "Traditional" C compilers would often put these in writable memory, but on modern systems they're basically always in read-only memory. I'm surprised it doesn't crash with TCC.

Here is a version of the program that compiles without complaint with GCC on my computer (even with a very high level of warnings) and appears to work correctly. I have made as few changes as possible. As usual with the best IOCCC entries, preprocessing and reformatting barely helped at all, although they did remove some traps for casual reverse engineers.

The program assumes system invokes a Bourne-style shell, and a Unix-style stty command is available to that shell. Also, it will malfunction (probably in an amusing fashion) if the execution character set isn't ASCII.

#include <stdlib.h>
#include <unistd.h>

const char o[] = ",,B3-u;.(&*5., /(b*(1\036!a%\031m,,,,,\r\n";

static char *ccc (char *cc)
{
    char *cccc = cc;
    int c;
    for (; (c = (*cc)); *cc++ = c)
    {
        switch (0xb + (c >> 5))
        {
        case '\v':
            break;
        case '\f':
            switch (c)
            {
            case (8098) & ('|' + 3):
                c = (8098) >> ('\n' - 3);
                break;
            case (6055) & ('|' + 3):
                c = (6055) >> ('\n' - 3);
                break;
            case (14779) & ('|' + 3):
                c = (14779) >> ('\n' - 3);
                break;
            case (10682) & ('|' + 3):
                c = (10682) >> ('\n' - 3);
                break;
            case (15276) & ('|' + 3):
                c = (15276) >> ('\n' - 3);
                break;
            case (11196) & ('|' + 3):
                c = (11196) >> ('\n' - 3);
                break;
            case (15150) & ('|' + 3):
                c = (15150) >> ('\n' - 3);
                break;
            case (11070) & ('|' + 3):
                c = (11070) >> ('\n' - 3);
                break;
            case (15663) & ('|' + 3):
                c = (15663) >> ('\n' - 3);
                break;
            case (11583) & ('|' + 3):
                c = (11583) >> ('\n' - 3);
                break;
            }
            break;
        default:
            c += o[c & (38 - 007)];
            switch (c -= '-' - 1)
            {
            case 0214:
            case 0216:
                c += 025;
                /*fallthru*/
            case 0207:
                c -= 4;
                /*fallthru*/
            case 0233:
                c += ' ' - 1;
            }
        }
        c &= 'z' + 5;
    }
    return cccc;
}

int
main (void)
{
    char c[] = "O";
    char cccc[] = "dijs QH.soav Vdtnsaoh DmfpaksoQz;kkt oa, -dijs";
    char ccccc[] = ";kkt -oa, dijszdijs QQ";
    system (ccc (cccc));
    for (;;)
    {
        read (0, c, 1);
        *c &= '~' + 1;
        write (1, ccc (c), '\0');
        switch (*c)
        {
        case 4:
            system (ccc (ccccc));
            return 0;
        case 13:
            write (1, o + ' ', 3);
            break;
        case 127:
            write (1, "\b \b", 3);
            break;
        default:
            write (1, c, 1);
            break;
        }
    }
    return 0;
}