C++: implementation-defined accepted physical source file characters

159 views Asked by At

According to the C++14 standard,

§2.2.1.1 [...] The set of physical source file characters accepted is implementation-defined. [...] Any source file character not in the basic source character set is replaced by the universal-character-name that designates that character. [...]

Does it means that the C++ standard gives not implementation-defined or conditionally-supported support for non UCS/Unicode characters? For example, a physical source file encoding including characters without corresponding UCS code point.

The only think I can think of is, if that were the case (the compiler supports non UCS character through non UCS encodings), the compiler had to use the private UCS ranges to map those physical characters, but anyway, that solution doesn't fit to the "universal-character-name that designates that character" part, because UCS code points inside private ranges doesn't define any specific character at all.

1

There are 1 answers

4
AndyG On

Not really.. Kind of. The important part of the [lex.phases] quote IMO is as follows:

Physical source file characters are mapped, [...], to the basic source character set

Only the basic source character set is supported, everything else must be somehow mapped to it ([lex.charset]):

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’

But the standard also says it should do this if necessary. It goes on to say the following:

The set of physical source file characters accepted is implementation-defined.

So I suppose that allows a compiler to do whatever it wants in the end so long as it at least supports the basic character set.