How can I prevent the gcc optimizer from producing incorrect bit operations?

Question

Consider the following program.

#include <stdio.h>

int negative(int A) {
    return (A & 0x80000000) != 0;
}
int divide(int A, int B) {
    printf("A = %d\n", A);
    printf("negative(A) = %d\n", negative(A));
    if (negative(A)) {
        A = ~A + 1;
        printf("A = %d\n", A);
        printf("negative(A) = %d\n", negative(A));
    }
    if (A < B) return 0;
    return 1;
}
int main(){
    divide(-2147483648, -1);
}

When it is compiled without compiler optimizations, it produces expected results.

gcc  -Wall -Werror -g -o TestNegative TestNegative.c
./TestNegative
A = -2147483648
negative(A) = 1
A = -2147483648
negative(A) = 1

When it is compiled with compiler optimizations, it produces the following incorrect output.

gcc -O3 -Wall -Werror -g -o TestNegative TestNegative.c
./TestNegative 
A = -2147483648
negative(A) = 1
A = -2147483648
negative(A) = 0

I am running gcc version 5.4.0.

Is there a change I can make in the source code to prevent the compiler from producing this behavior under -O3?

A = ~A + 1; is UB, if A == INT_MIN, the +1 makes a signed integer overflow. — mch
– mch, Commented Feb 13, 2018 at 7:54
Isn't 0x7FFFFFFF + 1 undefined behaviour anyway for 32-bit int type? — Weather Vane
– Weather Vane, Commented Feb 13, 2018 at 7:54
@mch, I think you hit the nail on the head. It never occurred to me that signed integer overflow doesn't behave like unsigned integer overflow. I just saw this question so now I know the difference. — merlin2011
– merlin2011, Commented Feb 13, 2018 at 7:56
@Someprogrammerdude No, it absolutely isn’t known for generating faulty code. Sure, there are compiler bugs in GCC (everything has bugs) but -O3 is a safe default, and there are not fundamentally more correctness-affecting bugs in that setting than in GCC in general. stackoverflow.com/a/11546263/1968 — Konrad Rudolph
– Konrad Rudolph, Commented Feb 13, 2018 at 12:23
@Voo It’s fair to say that this is plausible. But, barring evidence, it’s simply not the case in current compilers (-O3 used to be experimental, and hence buggy; but it hasn’t been for a long time). To say that it “is known to sometimes generate faulty code” is flat out wrong. — Konrad Rudolph
– Konrad Rudolph, Commented Feb 13, 2018 at 16:37

jfMR · Accepted Answer · 2018-02-13 10:53:57Z

83

-2147483648 does not do what you think it does. C doesn't have negative constants. Include limits.h and use INT_MIN instead (pretty much every INT_MIN definition on two's complement machines defines it as (-INT_MAX - 1) for a good reason).
A = ~A + 1; invokes undefined behavior because ~A + 1 causes integer overflow.

It's not the compiler, it's your code.

edited Feb 13, 2018 at 10:53

jfMR

25.2k5 gold badges69 silver badges88 bronze badges

answered Feb 13, 2018 at 8:02

Art

20.5k1 gold badge40 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

merlin2011 Over a year ago

Does the second line invoke undefined behavior only if the input is INT_MIN?

Art Over a year ago

@merlin2011 yes.

phuclv Over a year ago

stackoverflow.com/a/3803981/995714 stackoverflow.com/q/26003893/995714 -214748364 is actually an unsigned long constant on C90 stackoverflow.com/q/25658485/995714

Art Over a year ago

Ok, I've got to ask. What's happening? Who linked to this? I'm getting an insane amount of votes for what is a trivial answer to a trivial question.

vgru · Accepted Answer · 2018-02-13 08:48:46Z

45

The compiler replaces your A = ~A + 1; statement with a single neg instruction, i.e. this code:

int just_negate(int A) {
    A = ~A + 1;
    return A;
}

will be compiled to:

just_negate(int):
  mov eax, edi
  neg eax         // just negate the input parameter
  ret

But the compiler is also smart enough to realize that, if A & 0x80000000 was non-zero before negation, it must be zero after negation, unless you are relying on undefined behavior.

This means that the second printf("negative(A) = %d\n", negative(A)); can be "safely" optimized to:

mov edi, OFFSET FLAT:.LC0    // .string "negative(A) = %d\n"
xor eax, eax                 // just set eax to zero
call printf

I use the online godbolt compiler explorer to check the assembly for various compiler optimizations.

answered Feb 13, 2018 at 8:48

vgru

51.5k17 gold badges127 silver badges213 bronze badges

1 Comment

Martin Bonner supports Monica Over a year ago

This is a useful explanation of why the compiler produces unexpected results at high optimization levels. It is not looking for ways to trick the programmer when they invoke UB; it is looking for ways to make the code run faster, and assuming no UB to do that.

Lundin · Accepted Answer · 2018-02-13 14:13:55Z

To explain in detail what's going on here:

In this answer I'm assuming that long is 32 bits and long long is 64 bits. This is the most common case, but not guaranteed.
C does not have signed integer contants. -2147483648 is actually of type long long, on which you apply the unary minus operator.

The compiler picks the type of the integer constant after checking if 2147483648 can fit:
- Inside an int? No it cannot.
- Inside a long? No it cannot.
- Inside a long long? Yes it can. So the type of the integer constant will therefore be long long. Then apply unary minus on that long long.
Then you try to show this negative long long to a function expecting an int. A good compiler might warn here. You force an implicit conversion to a smaller type ("lvalue conversion").
However, assuming 2's complement, the value -2147483648 can fit inside an int, so no implementation-defined behavior is needed for the conversion, which would otherwise have been the case.
Next tricky part is the function negative where you use 0x80000000. This is not an int either, nor is it a long long, but an unsigned int (see this for an explanation).

When comparing your passed int with an unsigned int, "the usual arithmetic conversions" (see this) force an implicit conversion to the int to unsigned int. It doesn't affect the result in this specific case, but this is why gcc -Wconversion users do get a nice warning here.

(Hint: enable -Wconversion already! It is good for catching subtle bugs, but not part of -Wall or -Wextra.)
Next you do ~A, a bitwise inverse of the binary representation of the value, ending up with the value 0x7FFFFFFF. This is, as it turns out, the same value as INT_MAX on your 32 or 64 bit system. Thus 0x7FFFFFFF + 1 gives a signed integer overflow which leads to undefined behavior. This is the reason why the program is misbehaving.

Cheekily, we could change the code to A = ~A + 1u; and suddenly everything works as expected, again because of implicit integer promotion.

Lessons learned:

In C, integer constants, as well as implicit integer promotions, are very dangerous and unintuitive. They can subtly change the meaning of the program completely and introduce bugs. At each and every operation in C, you need to consider the actual types of the operands involved.

Playing around with C11 _Generic could be a good way to see the actual types. Example:

#define TYPE_SAFE(val, type) _Generic((val), type: val)
...
(void) TYPE_SAFE(-2147483648, int); // won't compile, type is long or long long
(void) TYPE_SAFE(0x80000000, int);  // won't compile, type is unsigned int

Good safety measures to protect yourself from bugs like these is to always use stdint.h and to use MISRA-C.

@Groo _Generic is a really nice feature and reason alone to go for C11. There's apparently some ambiguous behavior about it still, where different compilers interpret the standard differently. Hopefully this will be fixed in Cxx.

Matteo Italia · Accepted Answer · 2018-02-13 14:11:03Z

You are relying on undefined behavior. 0x7fffffff + 1 for 32 bit signed integers results in signed integer overflow, which is undefined behavior according to the standard, so anything goes.

In gcc you can force wraparound bahavior by passing -fwrapv; still, if you have no control over the flags - and more in general, if you want a more portable program - you should do all these tricks on unsigned integers, which are required by the standard to wrap around (and have well defined semantics for bitwise operations, unlike signed integers).

First convert the int to unsigned (well defined according to the standard, yields the expected result), do your stuff, convert back to int - implementation-defined (≠ undefined) for values bigger than the range of int, but actually defined by every compiler working in 2's complement to do the "right thing".

int divide(int A, int B) {
    printf("A = %d\n", A);
    printf("negative(A) = %d\n", negative(A));
    if (negative(A)) {
        A = ~((unsigned)A) + 1;
        printf("A = %d\n", A);
        printf("negative(A) = %d\n", negative(A));
    }
    if (A < B) return 0;
    return 1;
}

Your version (at -O3):

A = -2147483648
negative(A) = 1
A = -2147483648
negative(A) = 0

My version (at -O3):

A = -2147483648
negative(A) = 1
A = -2147483648
negative(A) = 1

Collectives™ on Stack Overflow

How can I prevent the gcc optimizer from producing incorrect bit operations?

4 Answers 4

4 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related