The questions below are about Character sets (C11, 5.2.1 Character sets) and mapping (C11, 5.1.1.2 Translation phases, 1).
The list:
Can a source character set as an extension include control characters, representing other than horizontal tab, vertical tab, and form feed? If yes, then does a diagnostic need to be produced when using such control characters in e.g. string literal?
Example: GCC/LLVM/MSVC support many control characters in a string literal w/o issuing a diagnostic AND they keep such control characters in the string literal after the mapping at the translation phase 1 is done. (Meaning that GCC/LLVM/MSVC support these control characters in the source character set.) Is it OK that diagnostic is not produced?
Demo:
# GCC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
t999.c:1:12: warning: null character(s) preserved in literal
1 | char x[] = "x x"; int s = sizeof x;
| ^
s:
.long 4
# here we see that a diagnostic is produced, sizeof x is 4
# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
s:
.long 4
# here we see that no diagnostic is produced, sizeof x is 4
# MSVC
# test \x00
# see below
# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s DD 04H
# here we see that no diagnostic is produced, sizeof x is 4
- C11, 5.1.1.2 Translation phases, 1:
Physical source file multibyte characters are mapped, in an implementation-defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary.
A simple question: is "mapping to nothing" still a mapping? E.g. X => <nothing>
. Or perhaps it is not a "mapping", but "skipping" (or "removal")? Example: in "x<null>y"
(in binary 22 78 00 79 22
) MSVC skips/removes null character w/o producing a diagnostic (making sizeof
produce 3 instead of 4). Is it OK?
Demo:
# MSVC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s DD 03H
# here we see that no diagnostic is produced, sizeof x is 3