How compiler handles a non-zero null pointer value in C?

458 views Asked by At

This answer properly explains about null pointers. In the last paragraph under Null Pointers it says

If the underlying architecture has a null pointer value defined as address 0xDEADBEEF, then it is up to the compiler to sort this mess out.

Now if some architecture internally defines Null pointer value as non-zero. How can these if statements stand valid. How compiler tackles them ?

if (!pointer)
if (pointer == NULL)
if (pointer == 0)

After all when a null pointer constant is assigned to a pointer, you get a null pointer and a null pointer constant is always a 0 or a (void *)0. Further this answer says that

So 0 is a null pointer constant. And if we convert it to a pointer type we will get a null pointer that might be non-all-bits-zero for some architectures.

I am really unable to understand how this literal 0 becomes non-all-bits-zero when initialized to a pointer. Isn't this a simple initialization ? Moreover if my null pointer value is non-zero, how can the above 3 if statements check for null pointer ? Here aren't we comparing a non-zero null pointer value with a 0 literal ?

1

There are 1 answers

2
Eric Postpischil On BEST ANSWER

if (!pointer)

If the C implementation used the value DEADBEEF16 for a null pointer, the compiler would compile if (!pointer) to code such as:

    compare             pointer, #0xDEADBEEF
    branch-if-not-equal else-clause

if (pointer == 0)

An integer constant zero qualifies as a “null pointer constant” (C 2018 6.3.2.3 3). When a pointer is compared to a null pointer constant, the null pointer constant is converted to the type of the pointer (6.5.9 5). The compiler would implement this conversion by producing DEADBEEF16 for the resulting pointer. Then it would compare pointer to DEADBEEF16 and branch accordingly.

Simply put, just because the character “0” appears in source code does not mean the compiler has to use zero in the instructions it generates. It generates whatever instructions and values it needs to get the job done.

I am really unable to understand how this literal 0 becomes non-all-bits-zero when initialized to a pointer.

There is nothing about the character “0” that forces a compiler to give it a value of zero. The code for “0” is 48 in ASCII and 240 in EBCDIC, so the compiler is not starting with a value of zero when it processes this or other characters. Normally, when processing numerals, it has to read the digits and do some arithmetic to calculate the numbers represented by the numerals. It is that software that causes “0” to stand for the value zero or that causes “23” to stand for the value twenty-three. To make “0” represent a null pointer, the software in the compiler simply substitutes the internal value of a null pointer wherever “0” is used in a context for a pointer.

For example, in void *x = 0;, the compiler might initially convert “0” to zero, but this will be part of a data structure that also says this is a token or integer constant expression that currently has the value zero. When the compiler sees this integer constant expression is being used to initialize a pointer, it will change the value, and it will generate code that initializes the pointer to the internal value for a null pointer.