C++ Phase 1 Translation Set of Source File Characters: Before vs. Post C++11, before C++23

75 views Asked by At

cppreference's page on phases of translation , for Phase 1 of translation (prior to C++23), has a note in its Step 2. that:

The set of source file characters accepted is implementation-defined (since C++11)

This is also echoed in a working draft of the spec (for instance, for C++20).

Given that, before C++23, it's always appeared to be implementation-defined for how bytes of the source code file get mapped to characters of the basic source character set, what has the above rule added as of C++11 achieving? Ie what's the net change for having it vs. not? This may just be a reading comprehension question, but it seems unclear to me, as then it goes on to say:

Any source file character that cannot be mapped to a character in the basic source character set is replaced by its universal character name (escaped with \u or \U) or by some implementation-defined form that is handled equivalently.

For instance, does that C++11 addition mean that it's up to the implementation to determine which characters to even consider mapping to the basic source character set or to a universal character name at all (ie that it's allowed to skip characters in the source file and not translate them at all)? Otherwise, by the just-above point about either being mapped to a basic source character or a universal character name or equivalent, it's not clear to me what this C++11 rule is achieving beyond step 1 of phase 1 on that cppreference page, namely that:

The individual bytes of the source code file are mapped (in an implementation-defined manner) to the characters of the basic source character set

1

There are 1 answers

0
Caleth On BEST ANSWER

Prior to C++11, the only directly available characters were those in the basic source character set, and other characters were required to be escaped.

Since C++11, it is implementation defined what characters in addition to the basic source character set are available without escaping, and which are escaped.

Since C++23, all unicode characters are available without escaping.