Does &((struct name *)NULL -> b) cause undefined behaviour in C11?

4.6k views Asked by At

Code sample:

struct name
{
    int a, b;
};

int main()
{
    &(((struct name *)NULL)->b);
}

Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".

6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.

The semantics of -> in 6.5.2.3/4 says:

A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue.

However, NULL does not point to an object, so the second sentence seems underspecified.

Also relevant might be 6.5.3.2/1:

Constraints:

The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.

However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.

6.3.2.1/1 does say:

An lvalue is an expression (with an object type other than void) that potentially designates an object; if an lvalue does not designate an object when it is evaluated, the behavior is undefined

however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).

This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)

6

There are 6 answers

1
Deduplicator On

First, let's establish that we need a pointer to an object:

6.5.2.3 Structure and union members

4 A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue.96) If the first expression is a pointer to a qualified type, the result has the so-qualified version of the type of the designated member.

Unfortunately, no null pointer ever points to an object.

6.3.2.3 Pointers

3 An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.66) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

Result: Undefined Behavior.

As a side-note, some other things to chew over:

6.3.2.3 Pointers

4 Conversion of a null pointer to another pointer type yields a null pointer of that type. Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.

So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.

2
Jens Gustedt On

Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.

The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)

This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.

21
Serge Ballesta On

From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.

From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.

As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.

My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.

20
Aaron Digulla On

No. Let's take this apart:

&(((struct name *)NULL)->b);

is the same as:

struct name * ptr = NULL;
&(ptr->b);

The first line is obviously valid and well defined.

In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.

In fact, the same approach is used on the C macro offsetof (wikipedia):

#define offsetof(st, m) ((size_t)(&((st *)0)->m))

So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).

0
2501 On

Let's start with the indirection operator *:

6.5.3.2 p4: The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type "pointer to type", the result has type "type". If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined. 102)

*E, where E is a null pointer, is undefined behavior.

There is a footnote that states:

102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is always true that if E is a function designator or an lvalue that is a valid operand of the unary & operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.

Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?

C Standard doesn't define that behavior.

If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.

6.3.2.3 Pointers

  1. An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. 66) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

This means that an integer constant expression with the value 0 is converted to a null pointer constant.

But the value of a null pointer constant is not defined as 0. The value is implementation defined.

7.19 Common definitions

  1. The macros are NULL which expands to an implementation-defined null pointer constant

This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior

Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.

0
supercat On

Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:

  1. Code loads a null pointer into the addressing unit
  2. Code asks the addressing unit to add the offset of field b.
  3. The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
  4. The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.

The very essence of what Undefined Behavior meant at the time.

Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.