What is the "trick" to writing a Quine?

11.4k views Asked by At

I read Ken Thompson's classic paper Reflections on Trusting Trust in which he prompts users to write a Quine as an introduction to his argument (highly recommended read).

A quine is a computer program which takes no input and produces a copy of its own source code as its only output.

The naive approach is simply to want to say:

print "[insert this program's source here]"

But one quickly sees that this is impossible. I ended up writing one myself using Python but still have trouble explaining "the trick". I'm looking for an excellent explanation of why Quines are possible.

2

There are 2 answers

0
Conrad Irwin On BEST ANSWER

The normal trick is to use printf such that the format-string represents the structure of the program, with a place-holder for the string itself to get the recursion you need:

The standard C example from http://www.nyx.net/~gthompso/quine.htm illustrates this quite well:

char*f="char*f=%c%s%c;main(){printf(f,34,f,34,10);}%c";main(){printf(f,34,f,34,10);}

edit: After writing this, I did a bit of searching: http://www.madore.org/~david/computers/quine.html gives a very good, more theoretical, description of what exactly quines are and why they work.

0
luser droog On

Here's one I wrote that uses putchar instead of printf; thus it has to process all its own escape codes. But it is %100 portable across all C execution character sets.

You should be able to see that there is a seam in the text representation that mirrors a seam in the program text itself, where it changes from working on the beginning to working on the end. The trick to writing a Quine is getting over this "hump", where you switch to digging your way out of the hole! Your options are constrained by the textual representation and the language's output facilities.

#include <stdio.h>

void with(char *s) {
    for (; *s; s++) switch (*s) {
    case '\n': putchar('\\'); putchar('n'); break;
    case '\\': putchar('\\'); putchar('\\'); break;
    case '\"': putchar('\\'); putchar('\"'); break;
    default: putchar(*s);
    }
}
void out(char *s) { for (; *s; s++) putchar(*s); }
int main() {
    char *a[] = {
"#include <stdio.h>\n\n",
"void with(char *s) {\n",
"    for (; *s; s++) switch (*s) {\n",
"   case '\\",
"n': putchar('\\\\'); putchar('n'); break;\n",
"   case '\\\\': putchar('\\\\'); putchar('\\\\'); break;\n",
"   case '\\\"': putchar('\\\\'); putchar('\\\"'); break;\n",
"   default: putchar(*s);\n",
"    }\n}\n",
"void out(char *s) { for (; *s; s++) putchar(*s); }\n",
"int main() {\n",
"    char *a[] = {\n",
NULL }, *b[] = {
"NULL }, **p;\n",
"    for (p = a; *p; p++) out(*p);\n",
"    for (p = a; *p; p++) {\n",
"   putchar('\\\"');\n",
"   with(*p);\n",
"   putchar('\\\"'); putchar(','); putchar('\\",
"n');\n",
"    }\n",
"    out(\"NULL }, *b[] = {\\",
"n\");\n",
"    for (p = b; *p; p++) {\n",
"   putchar('\\\"');\n",
"   with(*p);\n",
"   putchar('\\\"'); putchar(','); putchar('\\",
"n');\n",
"    }\n",
"    for (p = b; *p; p++) out(*p);\n",
"    return 0;\n",
"}\n",
NULL }, **p;
    for (p = a; *p; p++) out(*p);
    for (p = a; *p; p++) {
    putchar('\"');
    with(*p);
    putchar('\"'); putchar(','); putchar('\n');
    }
    out("NULL }, *b[] = {\n");
    for (p = b; *p; p++) {
    putchar('\"');
    with(*p);
    putchar('\"'); putchar(','); putchar('\n');
    }
    for (p = b; *p; p++) out(*p);
    return 0;
}

A common trick is to jump start the quine by writing a program to read a textfile and output an array of numbers. Then you modify it to use a static array, and run the first program against the new (static array) program, producing an array of number that represents the program. Insert that into the static array, run it again until it settles down, and that gets you a quine. But, it's tied to a specific character set (== not 100% portable). A program like the above (and not the classic printf hack) will work the same on ASCII or EBCDIC (the classic printf hack fails in EBCDIC because it contains hard-coded ASCII).


edit:

Reading the question again, carefully (finally), it appears you're actually looking for more philosophy less technique. The trick that lets you out of the infinite regress is the two-fer. You've got to get both the encoded program and the expanded program out of the same data: using the same data 2 ways. This data thus only describes the part of the program surrounding its future manifestation, the frame. The image within this frame is a straight copy of the original.

This is how you would naturally go about producing a recursive drawing by hand: the tv of a tv of tv. At some point you get tired and just sketch some glare over the screen, because the recursion has been sufficiently established.


edit:

I'm looking for an excellent explanation of why Quines are possible.

The "possibility" of a Quine goes into the depths of the mathematical revolutions of the 19th and 20th centuries. The "classic" quine by W. V. O. Quine, is the sequence of words (IIRC)

yields false when appended to itself

which is a paradox, akin to David's request for something that "makes me happy when sad, and makes me sad when happy" answered by the medallion inscribed on both sides: "this too shall pass".

The same sort of knot was investigated by the Pioneers of modern mathematical logic such as Frege, Russell and Whitehead, Łukasiewicz, and of course, our boys Turing, Church and Thue. The trick that makes it possible to transpose the Quine from the realm of wordplay to a programmatic demonstration (untwisting the paradox part along the way), was Gödel's method of encoding the arithmetic operations themselves as numbers, so an entire mathematical expression can be encoded into a single (enormous) integer. In particular, a mathematical function that performs a decoding of this representation can be expressed in the same (numerical) form. This number (a Gödel-encoded function) is both code and data.

This power-trio (Code, Representation, Data), can be transposed to different repesentations. By choosing a different Representation (or a chain like: bytes-> ASCII-> hexadecimal-> integer), alters the behavior of the Code, which alters the appearance of the Data.