Are digraphs transformed by a compiler and trigraphs transformed by a preprocessor?

190 views Asked by At

I'm trying to understand both trigraphs and digraphs rather than use them.

I've read that post and I understood that:

  • Converting trigraphs to corresponding characters shall always be done by the preprocessor, before the actual compilation starts.
  • Converting digraphs to corresponding characters shall be performed by the compiler.

Is this true?

2

There are 2 answers

2
chqrlie On BEST ANSWER

Trigraph sequences are indeed replaced with the corresponding character at the first phase of the compiling process, before the preprocessor lexer analyses the stream of characters to produce preprocessor tokens.

The very next phase handles escaped newlines, ie: instances of \ immediately followed by a newline, which are removed from the character stream. Note that the \ can be produced by the first phase as a replacement for the ??/ trigraph.

The lexer then analyses the character stream to produce preprocessing tokens, such as [, and <: which are alternate spellings for the same token, just like 1e1 and 1E1, hence <: is not replaced with [, it is a different sequence of characters producing the same token.

Trigraphs cannot be produced by token pasting using the ## preprocessor operator in macro expansions, but digraphs can.

Here is a small sample program to illustrate this process, including th special handing of the ??/ trigraph that expands to \, thus can be used in the middle of a digraph split on 2 lines:

#include <stdio.h>

#define STR(x) #x
#define xSTR(x) STR(x)
#define glue(a,b) a##b

int main() {
    puts(STR(??!));
    puts(STR('??!'));
    puts(STR("??!"));

    puts(STR(<:));
    puts(STR('<:'));
    puts(STR("<:"));

    puts(STR(<\
:));
    puts(STR(<??/
:));
    puts(STR('<\
:'));
    puts(STR("<\
:"));

    puts(STR(glue(<,:)));
    puts(xSTR(glue(<,:)));
    return 0;
}

Output:

chqrlie $ make lexing && ./lexing
clang -O3 -funsigned-char -std=c11 -Weverything -Wwrite-strings  -lm -o lexing lexing.c
lexing.c:8:14: warning: trigraph converted to '|' character [-Wtrigraphs]
    puts(STR(??!));
             ^
lexing.c:9:15: warning: trigraph converted to '|' character [-Wtrigraphs]
    puts(STR('??!'));
              ^
lexing.c:10:15: warning: trigraph converted to '|' character [-Wtrigraphs]
    puts(STR("??!"));
              ^
lexing.c:18:15: warning: trigraph converted to '\' character [-Wtrigraphs]
    puts(STR(<??/
              ^
4 warnings generated.
|
'|'
"|"
<:
'<:'
"<:"
<:
<:
'<:'
"<:"
glue(<,:)
<:
0
rici On

Digraphs are not "converted to the corresponding character." The string literal "<:" contains the two characters < and : (plus a null terminator). Contrast that with the string "??(" if you have a compiler which supports trigraphs.

<: is simply a token with exactly the same syntactic significance as [. But it is never converted to [. If you pass it to the stringify operator #, you will get the string "<:".