C standard addressing simplification inconsistency

267 views Asked by At

Section §6.5.3.2 "Address and indirection operators" ¶3 says (relevant section only):

The unary & operator returns the address of its operand. ... If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. ...

This means that this:

#define NUM 10
int tmp[NUM];
int *i = tmp;
printf("%ti\n", (ptrdiff_t) (&*i - i) );
printf("%ti\n", (ptrdiff_t) (&i[NUM] - i) );

Should be perfectly legal, printing 0 and the NUM (10). The standard seems very clear that both of those cases are required to be optimized.

However, it doesn't seem to require the following to be optimized:

struct { int a; short b; } tmp, *s = tmp;
printf("%ti\n", (ptrdiff_t) (&s->b - s) );

This seems awfully inconsistent. I can see no reason that the above code shouldn't print the sizeof(int) plus (unlikely) padding (possibly 4).

Simplifying a &-> expression is going to be the same conceptually (IMHO) as &[], a simple address-plus-offset. It's even an offset that's going to be determinable at compile time, rather than potentially runtime with the [] operator.

Is there anything in the rationale about why this is so seemingly inconsistent?

3

There are 3 answers

4
James McNellis On BEST ANSWER

In your example, &i[10] is actually not legal: it becomes i + 10, which becomes NULL + 10, and you can't perform arithmetic on a null pointer. (6.5.6/8 lists the conditions under which pointer arithmetic can be performed)

Anyway, this rule was added in C99; it was not present in C89. My understanding is that it was added in large part to make code like the following well-defined:

int* begin, * end;
int v[10];

begin = &v[0];
end = &v[10];

That last line is technically invalid in C89 (and in C++) but is allowed in C99 because of this rule. It was a relatively minor change that made a commonly used construct well-defined.

Because you can't perform arithmetic on a null pointer, your example (&s->b) would be invalid anyway.

As for why there is this "inconsistency," I can only guess. It's likely that no one thought to make it consistent or no one saw a compelling use case for this. It's possible that this was considered and ultimately rejected. There are no remarks about the &* reduction in the Rationale. You might be able to find some definitive information in the WG14 papers, but unfortunately they seem to be quite poorly organized, so trawling through them may be tedious.

8
Scott M. On

I believe that the compiler can choose to pack in different ways, possibly adding padding between members of a struct to increase memory access speed. This means that you can't for sure say that b will always be an offset of 4 away. The single value does not have the same problem.

Also, the compiler may not know the layout of a struct in memory during the optimization phase, thus preventing any sort of optimization concerning struct member accesses and subsequent pointer casts.


edit:

I have another theory...

many times the compiler will optimize the abstract syntax tree just after lexical analysis and parsing. This means it will find things like operators that cancel out and expressions that evaluate to a constant and reduce those sections of the tree to one node. This also means that the information about structs is not available. later optimization passes that occur after some code generation may be able to take this into account because they have additional information, but for things like trimming the AST, that information is not yet there.

4
AProgrammer On

I think that the rule hasn't been added for optimization purpose (what does it bring that the as-if rule doesn't?) but to allow &t[sizeof(t)/sizeof(*t)] and &*(t+sizeof(t)/sizeof(*t)) which would be undefined behaviour without it (writing such things directly may seem silly, but add a layer or two of macros and it can make sense). I don't see a case where special casing &p->m would bring such benefit. Note that as James pointed out, &p[10] with p a null pointer is still undefined behaviour; &p->m with p a null pointer would similarly have stayed invalid (and I must admit that I don't see any use when p is the null pointer).