c - Why does i = ++i invoke undefined behaviour?

1.8k views Asked by At

I understand that C uses the notion of sequence points to identify ambiguous computations, and that = operator is not a sequence point. However, I am unable to see any ambiguity in executing the statement

i = ++i

As per my understanding, this simply amounts to evaluating whatever is at &i, incrementing it, and storing it back at the same location. Yet, GCC flags it as under:

[Warning] operation on 'i' may be undefined [-Wsequence-point]

Am I missing something about how = functions ?

EDIT : Before marking as duplicate, please note that I have browsed other posts about sequence points and undefined behavior. None of them addresses the expression i=++i (note the pre-increment) specifically. Expressions mentioned are generally i=i++, a=b++ + ++b, etc. And I have no doubts regarding any of them.

2

There are 2 answers

9
IdeaHat On BEST ANSWER

You are missing something about undefined behavior. Undefined behavior simply means the compiler can do whatever it wants. It can throw an error, it can (as GCC does) show a warning, it can cause demons to fly out of your nose. The primary thing is, it won't behave well and it won't behave consistently between compilers, so don't do it!

In this case, the compiler does NOT have to make the guarentee that the side effects of the lhs of the operator must be completed before the rhs of the statement is returned. This seems funny to you but you don't think like a computer. It could, if it wants, calculate the return value and return it in a register, assign it to i, and then perform the increment on the actual value. So it would look more like

register=i+1;
i=register;
i=i+1;

The standard gives you no guarantee that this doesn't happen, so just don't do it!

0
voithos On

The undefined behavior arises because the variable i is modified more than once between two sequence points. Sequence points are points after which all side effects of previous evaluations are visible, but no future side effects are visible. The standard states:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

So, what are the side effects that we are concerned about?

  • ++i, which assigns to i the value i+1
  • i = ++i, which assigns to i the value of the expression ++i, which is i+1

So, we are going to get two (admittedly, equivalent) side effects: assigning i+1 to the variable i. What we're concerned about is, between which two sequence points do these side effects occur?

What operations constitute sequence points? There are multiple, but there is only one that is actually relevant here:

  • at the end of a full expression (in this case, i = ++i is a full expression)

Namely, the pre-increment ++i is not a sequence point. Which means that both side effects (the increment, and the assignment) will occur between the same two sequence points, modifying the same variable i. Thus, it is undefined behavior; the fact that both modifications happen to have the same value is inconsequential.


But why is it bad to modify a variable multiple times between sequence points? To prevent things like:

i = ++i + 1;

Here, i is incremented, but then it is also assigned the value (i+1) + 1, due to the semantics of the pre-increment. Since the side effects have an ambiguous ordering, the behavior is undefined.

Now, there could hypothetically be a special case made in the standard that multiple modification between two sequence points is OK as long as the values are the same, but this would likely needlessly complicate compiler implementations, without much benefit.