c++ – Understand where undefined behavior in arithmetic expressions

Question:

Quite a frequently discussed topic, but nevertheless I would like to more specifically understand where UB is and where not.

Below are some examples, and my thoughts on which is which:

int i = 0, x = 1;
int a[6] = {0, 0, 0, 0, 0, 0};

i = ++i + ++i; // UB
i = i++ + ++i; // UB
x = i++ + ++i; // ?? я думаю что UB 
x = i++ + i++; // OK
a[i] += i++;   // OK ??
a[++i] = i++;  // UB ??
a[++i] = ++i   // UB
i += ++i;      // UB
j += j;        // OK

For all the time I understood one rule – if in one expression the value of an object changes more than once – UB pure water , that is, it is clear that on some platforms what is expected can and will happen, but the question is that the standard in in such cases, nothing guarantees – that is, either it is said that this is an undefined behavour , an unspecified behavour or an implementation defined behavour .

The question is – can there be cases other than UB in these examples?

Answer:

The languages ​​C and C ++ are fundamentally different in one important detail: the C ++ language tries to preserve the lvalue of the expression result as carefully as possible (lvalue-preserving language), and the C language, on the contrary – in most cases, immediately carelessly loses the lvalue of the expression result ( lvalue-discarding language)

int a = 0, b = 1;
a = b;     // lvalue в C++, rvalue в С
++b;       // lvalue в C++, rvalue в С
1 ? a : b; // lvalue в C++, rvalue в С
(a, b);    // lvalue в C++, rvalue в С

These properties of these languages ​​dictate significant differences in their approach to the sequencing of operations in expressions. The first C ++ standard (C ++ 98) tried to ignore this point and adhere to the ordering approach inherited directly from C, but in the end this model was found to be defective and significantly revised. In the process of this reworking, C ++ introduced an ordering that was not there before. As a consequence, some of the expressions that formally spawned UB in C ++ 98 got very specific behavior in C ++ 11. And C ++ 17 added even more ordering relationships to the C ++ language, thereby further expanding the range of expressions whose behavior is defined.

Therefore, the answer to the question about the presence of UB in an expression can differ significantly between C and C ++. For example, the expression

i = ++i;

spawns UB in C, but has very specific behavior in C ++. The reason for this difference is that C ++ guarantees that the pre-increment modifies the variable i before the pre-increment has finished evaluating. Nothing of the sort is guaranteed in the C language.

The rules "if the value of an object changes more than once in one expression" never existed anywhere. A more or less correct form of this rule would be "if the value of an object is modified more than once between a pair of adjacent sequence points , then the behavior is undefined." Also, do not forget the second part of this rule: "if the value of an object is modified between a pair of adjacent sequence points, and there is also an independent reading of the value of this object, then the behavior is undefined." After reworking in C ++ 11, however, these rules only apply to C, not C ++ (see example above).

It is also worth noting that these rules are based on the concept of a sequence point , while the modern specifications of these languages ​​(both C ++ and C) have decided to abandon this concept and replace it with the concept of sequencing ( sequencing , sequenced before , sequenced after ). But, once again, in C, these rules are still fairly accurate in reflecting the situation with UB in expressions.

The new rule, applicable in both C and C ++, sounds like this

If a side effect affecting a scalar object is unsequenced with respect to another side effect affecting the same scalar object, or with respect to evaluating the value of the same scalar object, then the behavior is undefined.

The differences between C and C ++ boil down to differing guarantees of what is sequenced and what is not (unsequenced). The ordering is stipulated in the descriptions of specific operators of the language.

In your examples

i = ++i + ++i; // UB и в С, и в С++
i = i++ + ++i; // UB и в С, и в С++
x = i++ + ++i; // UB и в С, и в С++
x = i++ + i++; // UB и в С, и в С++
a[i] += i++;   // UB и в С, и в С++11, все в порядке в С++17
a[++i] = i++;  // UB и в С, и в С++11, все в порядке в С++17
a[++i] = ++i   // UB и в С, и в С++11, все в порядке в С++17
i += ++i;      // UB в С, все в порядке в С++11 (?), все в порядке в С++17
j += j;        // OK

(I'm not sure about my interpretation of i += ++i . A += B defined through A = A + B , but i = i + ++i is UB in C ++ too. But I suspect that the situation is saved by that A in A += B evaluated only once.)

Scroll to Top