Asked  7 Months ago    Answers:  5   Viewed   48 times

The terms 'operator precedence' and 'order of evaluation' are very commonly used terms in programming and extremely important for a programmer to know. And, as far as I understand them, the two concepts are tightly bound; one cannot do without the other when talking about expressions.

Let us take a simple example:

int a=1;  // Line 1
a = a++ + ++a;  // Line 2
printf("%d",a);  // Line 3

Now, it is evident that Line 2 leads to Undefined Behavior, since Sequence points in C and C++ include:

  1. Between evaluation of the left and right operands of the && (logical AND), || (logical OR), and comma operators. For example, in the expression *p++ != 0 && *q++ != 0, all side effects of the sub-expression *p++ != 0 are completed before any attempt to access q.

  2. Between the evaluation of the first operand of the ternary "question-mark" operator and the second or third operand. For example, in the expression a = (*p++) ? (*p++) : 0 there is a sequence point after the first *p++, meaning it has already been incremented by the time the second instance is executed.

  3. At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.

  4. Before a function is entered in a function call. The order in which the arguments are evaluated is not specified, but this sequence point means that all of their side effects are complete before the function is entered. In the expression f(i++) + g(j++) + h(k++), f is called with a parameter of the original value of i, but i is incremented before entering the body of f. Similarly, j and k are updated before entering g and h respectively. However, it is not specified in which order f(), g(), h() are executed, nor in which order i, j, k are incremented. The values of j and k in the body of f are therefore undefined.3 Note that a function call f(a,b,c) is not a use of the comma operator and the order of evaluation for a, b, and c is unspecified.

  5. At a function return, after the return value is copied into the calling context. (This sequence point is only specified in the C++ standard; it is present only implicitly in C.)

  6. At the end of an initializer; for example, after the evaluation of 5 in the declaration int a = 5;.

Thus, going by Point # 3:

At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.

Line 2 clearly leads to Undefined Behavior. This shows how Undefined Behaviour is tightly coupled with Sequence Points.

Now let us take another example:

int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5

Now its evident that Line 5 will make the variable result store 1.

Now the expression x<y<z in Line 5 can be evaluated as either:

x<(y<z) or (x<y)<z. In the first case the value of result will be 0 and in the second case result will be 1. But we know, when the Operator Precedence is Equal/Same - Associativity comes into play, hence, is evaluated as (x<y)<z.

This is what is said in this MSDN Article:

The precedence and associativity of C operators affect the grouping and evaluation of operands in expressions. An operator's precedence is meaningful only if other operators with higher or lower precedence are present. Expressions with higher-precedence operators are evaluated first. Precedence can also be described by the word "binding." Operators with a higher precedence are said to have tighter binding.

Now, about the above article:

It mentions "Expressions with higher-precedence operators are evaluated first."

It may sound incorrect. But, I think the article is not saying something wrong if we consider that () is also an operator x<y<z is same as (x<y)<z. My reasoning is if associativity does not come into play, then the complete expressions evaluation would become ambiguous since < is not a Sequence Point.

Also, another link I found says this on Operator Precedence and Associativity:

This page lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.

So taking, the second example of int result=x<y<z, we can see here that there are in all 3 expressions, x, y and z, since, the simplest form of an expression consists of a single literal constant or object. Hence the result of the expressions x, y, z would be there rvalues, i.e., 10, 1 and 2 respectively. Hence, now we may interpret x<y<z as 10<1<2.

Now, doesn't Associativity come into play since now we have 2 expressions to be evaluated, either 10<1 or 1<2 and since the precedence of operator is same, they are evaluated from left to right?

Taking this last example as my argument:

int myval = ( printf("Operatorn"), printf("Precedencen"), printf("vsn"),
printf("Order of Evaluationn") );

Now in the above example, since the comma operator has same precedence, the expressions are evaluated left-to-right and the return value of the last printf() is stored in myval.

In SO/IEC 9899:201x under J.1 Unspecified behavior it mentions:

The order in which subexpressions are evaluated and the order in which side effects take place, except as specified for the function-call (), &&, ||, ?:, and comma operators (6.5).

Now I would like to know, would it be wrong to say:

Order of Evaluation depends on the precedence of operators, leaving cases of Unspecified Behavior.

I would like to be corrected if any mistakes were made in something I said in my question. The reason I posted this question is because of the confusion created in my mind by the MSDN Article. Is it in Error or not?

 Answers

21

Yes, the MSDN article is in error, at least with respect to standard C and C++1.

Having said that, let me start with a note about terminology: in the C++ standard, they (mostly--there are a few slip-ups) use "evaluation" to refer to evaluating an operand, and "value computation" to refer to carrying out an operation. So, when (for example) you do a + b, each of a and b is evaluated, then the value computation is carried out to determine the result.

It's clear that the order of value computations is (mostly) controlled by precedence and associativity--controlling value computations is basically the definition of what precedence and associativity are. The remainder of this answer uses "evaluation" to refer to evaluation of operands, not to value computations.

Now, as to evaluation order being determined by precedence, no it's not! It's as simple as that. Just for example, let's consider your example of x<y<z. According to the associativity rules, this parses as (x<y)<z. Now, consider evaluating this expression on a stack machine. It's perfectly allowable for it to do something like this:

 push(z);    // Evaluates its argument and pushes value on stack
 push(y);
 push(x);
 test_less();  // compares TOS to TOS(1), pushes result on stack
 test_less();

This evaluates z before x or y, but still evaluates (x<y), then compares the result of that comparison to z, just as it's supposed to.

Summary: Order of evaluation is independent of associativity.

Precedence is the same way. We can change the expression to x*y+z, and still evaluate z before x or y:

push(z);
push(y);
push(x);
mul();
add();

Summary: Order of evaluation is independent of precedence.

When/if we add in side effects, this remains the same. I think it's educational to think of side effects as being carried out by a separate thread of execution, with a join at the next sequence point (e.g., the end of the expression). So something like a=b++ + ++c; could be executed something like this:

push(a);
push(b);
push(c+1);
side_effects_thread.queue(inc, b);
side_effects_thread.queue(inc, c);
add();
assign();
join(side_effects_thread);

This also shows why an apparent dependency doesn't necessarily affect order of evaluation either. Even though a is the target of the assignment, this still evaluates a before evaluating either b or c. Also note that although I've written it as "thread" above, this could also just as well be a pool of threads, all executing in parallel, so you don't get any guarantee about the order of one increment versus another either.

Unless the hardware had direct (and cheap) support for thread-safe queuing, this probably wouldn't be used in in a real implementation (and even then it's not very likely). Putting something into a thread-safe queue will normally have quite a bit more overhead than doing a single increment, so it's hard to imagine anybody ever doing this in reality. Conceptually, however, the idea is fits the requirements of the standard: when you use a pre/post increment/decrement operation, you're specifying an operation that will happen sometime after that part of the expression is evaluated, and will be complete at the next sequence point.

Edit: though it's not exactly threading, some architectures do allow such parallel execution. For a couple of examples, the Intel Itanium and VLIW processors such as some DSPs, allow a compiler to designate a number of instructions to be executed in parallel. Most VLIW machines have a specific instruction "packet" size that limits the number of instructions executed in parallel. The Itanium also uses packets of instructions, but designates a bit in an instruction packet to say that the instructions in the current packet can be executed in parallel with those in the next packet. Using mechanisms like this, you get instructions executing in parallel, just like if you used multiple threads on architectures with which most of us are more familiar.

Summary: Order of evaluation is independent of apparent dependencies

Any attempt at using the value before the next sequence point gives undefined behavior -- in particular, the "other thread" is (potentially) modifying that data during that time, and you have no way of synchronizing access with the other thread. Any attempt at using it leads to undefined behavior.

Just for a (admittedly, now rather far-fetched) example, think of your code running on a 64-bit virtual machine, but the real hardware is an 8-bit processor. When you increment a 64-bit variable, it executes a sequence something like:

load variable[0]
increment
store variable[0]
for (int i=1; i<8; i++) {
    load variable[i]
    add_with_carry 0
    store variable[i]
}

If you read the value somewhere in the middle of that sequence, you could get something with only some of the bytes modified, so what you get is neither the old value nor the new one.

This exact example may be pretty far-fetched, but a less extreme version (e.g., a 64-bit variable on a 32-bit machine) is actually fairly common.

Conclusion

Order of evaluation does not depend on precedence, associativity, or (necessarily) on apparent dependencies. Attempting to use a variable to which a pre/post increment/decrement has been applied in any other part of an expression really does give completely undefined behavior. While an actual crash is unlikely, you're definitely not guaranteed to get either the old value or the new one -- you could get something else entirely.


1 I haven't checked this particular article, but quite a few MSDN articles talk about Microsoft's Managed C++ and/or C++/CLI (or are specific to their implementation of C++) but do little or nothing to point out that they don't apply to standard C or C++. This can give the false appearance that they're claiming the rules they have decided to apply to their own languages actually apply to the standard languages. In these cases, the articles aren't technically false -- they just don't have anything to do with standard C or C++. If you attempt to apply those statements to standard C or C++, the result is false.

Tuesday, June 1, 2021
 
joostvandriel
answered 7 Months ago
70

You need to ask Brian Kernighan or Dennis Ritchie.
From this forum: http://bytes.com/topic/c/answers/167377-operator-precedence

The && and || operators were added later for their "short-circuiting" behavior. Dennis Ritchie admits in retrospect that the precedence of the bitwise operators should have been changed when the logical operators were added. But with several hundred kilobytes of C source code in existence at that point and an installed base of three computers, Dennis thought it would be too big of a change in the C language...

So, that might be a reason? I'm guessing since there are several layers of bitwise precendence (unlike relational comparisons) that it's cruft that's existed since...forever...and just was never corrected.

Wednesday, June 2, 2021
 
turik
answered 6 Months ago
27

In C++, for user-defined types a + b is a function call, and the standard says:

§5.2.2.8 - [...] The order of evaluation of function arguments is unspecified. [...]

For normal operators, the standard says:

§5.4 - Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified. [...]

These haven't been changed for C++11. However, the wording changes in the second one to say that the order is "unsequenced" rather than unspecified, but it is essentially the same.

I don't have a copy of the C standard, but I imagine that it is the same there as well.

Wednesday, June 2, 2021
 
kwhohasamullet
answered 6 Months ago
88

An operator precedence table would show that a -> binds more tightly than the cast.

typedef void *typeA;
typeB b;
typeA a = &b;
(typeB *)a->i;   /* wrong: attempt to dereference a void pointer */
((typeB *)a)->i; /* ok */

A complete operator precedence table for your future reference is provided below. In the table, operators higher in the list bind more tightly than operators lower in the list (so the Primary Expression Operators bind the most tightly). If operators of the same precedence are used in the same expression in an ambiguous way (that is, not captured within a Primary Expression Operator like () or []), then it is resolved by following the associativity direction. So, for example the expression:

7 + (3 - 5 * 2 + 15) - 6

is evaluated in this order (according to the table):

7 + (3 - 10 + 15) - 6     // evaluate * in ()
7 + (-7 + 15) - 6         // evaluate - in () (it is left of the +)
7 + 8 - 6                 // evaluate + in ()
15 - 6                    // evaluate + (it is left of the -)
9                         // evaluate -

Of course, the compiler is free to perform the computation differently, but its result must match the result obtained when following the precedence rules.

Operator Type   Operator(s)                                     Associativity
=============   ===========                                     =============
Primary         () [] . -> expr++ expr--                        left-to-right
Expression 
Operators       
-------------   -----------                                     -------------
Unary           * & + - ! ~ ++expr --expr (typecast) sizeof     right-to-left
Operators       
-------------   -----------                                     -------------
Binary          * / %                                           left-to-right
Operators       + -
                >> <<
                < > <= >=
                == !=
                &
                ^
                |
                &&
                ||
-------------   -----------                                     -------------
Ternary         ?:                                              right-to-left
Operator
-------------   -----------                                     -------------
Assignment      = += -= *= /= %= >>= <<= &= ^= |=               right-to-left
Operators       
-------------   -----------                                     -------------
Comma           ,                                               left-to-right
=============   ===========                                     =============
Friday, July 30, 2021
 
hnkk
answered 4 Months ago
62

The important part about order of evaluation is whether any of the components have side effects.

Suppose you have this:

int i = c() + a() * b();

Where a and b have side effects:

int global = 1;

int a() {
    return global++;
}
int b() {
    return ++global;
}
int c() {
    return global * 2;
}

The compiler can choose what order to call a(), b() and c() and then insert the results into the expression. At that point, precedence takes over and decides what order to apply the + and * operators.

In this example the most likely outcomes are either

  1. The compiler will evaluate c() first, followed by a() and then b(), resulting in i = 2 + 1 * 3 = 5
  2. The compiler will evaluate b() first, followed by a() and then c(), resulting in i = 6 + 2 * 2 = 10

But the compiler is free to choose whatever order it wants.

The short story is that precedence tells you the order in which operators are applied to arguments (* before +), whereas order of evaluation tells you in what order the arguments are resolved (a(), b(), c()). This is why they are "different but related".

Friday, July 30, 2021
 
nasty
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share