27

In this article: http://googleresearch.blogspot.sg/2006/06/extra-extra-read-all-about-it-nearly.html, it mentioned most quick sort algorithm had a bug (left+right)/2, and it pointed out that the solution was using left+(right-left)/2 instead of (left+right)/2. The solution was also given in question Bug in quicksort example (K&R C book)?

My question is why left+(right-left)/2 can avoid overflow? How to prove it? Thanks in advance.

8 Answers 8

53

You have left < right by definition.

As a consequence, right - left > 0, and furthermore left + (right - left) = right (follows from basic algebra).

And consequently left + (right - left) / 2 <= right. So no overflow can happen since every step of the operation is bounded by the value of right.


By contrast, consider the buggy expression, (left + right) / 2. left + right >= right, and since we don’t know the values of left and right, it’s entirely possible that that value overflows.

Sign up to request clarification or add additional context in comments.

Comments

16

Suppose (to make the example easier) the maximum integer is 100, left = 50, and right = 80. If you use the naive formula:

int mid = (left + right)/2;

the addition will result in 130, which overflows.

If you instead do:

int mid = left + (right - left)/2;

you can't overflow in (right - left) because you're subtracting a smaller number from a larger number. That always results in an even smaller number, so it can't possibly go over the maximum. E.g. 80 - 50 = 30.

And since the result is the average of left and right, it must be between them. Since these are both less than the maximum integer, anything between them is also less than the maximum, so there's no overflow.

Comments

8

Basic logic.

  1. by definition left <= MAX_INT
  2. by definition right <= MAX_INT
  3. left+(right-left) is equal to right, which already is <= MAX_INT per #2
  4. and so left+(right-left)/2 must also be <= MAX_INT since x/2 is always smaller than x.

Compare to the original

  1. by definition left <= MAX_INT
  2. by definition right <= MAX_INT
  3. therefore left+right <= MAX_INT
  4. and so (left+right)/2 <= MAX_INT

where statement 3 is clearly false, since left can be MAX_INT (statement 1) and so can right (statement 2).

Comments

8

A simple worked example will show it. For simplicity, assume numbers overflow above 999. If we have:

left = 997
right = 999

then:

left + right = 1996

which has overflown before we get to the /2. However:

right - left = 2
(right-left)/2 = 1
left + (right-left)/2 = 997 + 1 = 998

So we've avoided the overflow.

More generally (as others have said): If both left and right are within range (and assuming right > left, then (right-left)/2 will be within range and so too must left + (right-left)/2 since this must be less than right (since you've increased left by half the gap between it and right.

1 Comment

incorrect calculation left=997 right=999 left+right =1996 not 1995
6

As int data type is 32 bit in Java (Assuming a programming language), any value that surpasses 32 bits gets rolled over. In numerical terms, it means that after incrementing 1 on Integer.MAX_VALUE (2147483647), the returned value will be -2147483648.

Coming to the question above lets assume the following:

int left = 1;
int right = Integer.MAX_VALUE;
int mid;

Case 1:

mid = (left +right)/2; 
//Here the value of left + right would be -2147483648 which would overflow.

Case 2:

mid = left + (right - left)/2;
//This would not have the same problem as above as the value would never exceed "right".

In theory:

Both the values are same as left + (right - left)/2 = (2*left + right - left)/2 = (left + right)/2

Hope this answers your question.

Comments

2

(This is more an intuitive explanation than a proof.)

Assume your data is unsigned char, and left = 100 and right = 255 (so right as at the edge of the range). If you do left + right, you'll get 355, which does not fit the unsigned char range, so it will overflow.

However, (right-left)/2 is a quantity X such that left + X < right < MAX, where MAX is 255 for unsigned char. This way, you can be sure that the sum can never overflow.

Comments

0

Why not m = (l - r) / 2? Since we do not need already traversed indexes where from the start to the current left?

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
0

About the question itself, the former answers have explained it clearly. But when I tried to figure out its operation mechanism, I found something interesting.

The new question is what will happen when the code mid = left + right - left is running, will it do add first and then sub? If so, whether it'll be overflow in the process?, will the result be infected?

The answer is whether add first sub second depends on compiler, it'll be overflow in the process if it do so, and the result won't be infected.

Test Code 1:

int square() {
    int mid, left = 2147483647, right = 2147483647;
    mid = left + right - left;
    return mid;
}

After x86-64 Clang 18.1.0 compiled:

square:                                 # @square
        push    rbp
        mov     rbp, rsp
        mov     dword ptr [rbp - 8], 2147483647
        mov     dword ptr [rbp - 12], 2147483647
        mov     eax, dword ptr [rbp - 8]
        add     eax, dword ptr [rbp - 12] # add first (eax = -2)
        sub     eax, dword ptr [rbp - 8]  # sub second (eax = 2147483647) 
        mov     dword ptr [rbp - 4], eax
        mov     eax, dword ptr [rbp - 4]
        pop     rbp
        ret

After x86-64 gcc 14.1 compiled

square:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 2147483647
        mov     DWORD PTR [rbp-8], 2147483647
        mov     eax, DWORD PTR [rbp-8]
        mov     DWORD PTR [rbp-12], eax
        mov     eax, DWORD PTR [rbp-12]  # it does't even do the simple math totally (optimized)
        pop     rbp
        ret

After loongarch64 gcc 14.1.0 compiled

square:
        addi.d  $r3,$r3,-32
        st.d    $r22,$r3,24
        addi.d  $r22,$r3,32
        lu12i.w $r12,2147479552>>12                 # 0x7ffff000
        ori     $r12,$r12,4095
        st.w    $r12,$r22,-20
        lu12i.w $r12,2147479552>>12                 # 0x7ffff000
        ori     $r12,$r12,4095
        st.w    $r12,$r22,-24
        ld.w    $r12,$r22,-24   # first
        st.w    $r12,$r22,-28   # second (same like gcc too, optimized)
        ldptr.w $r12,$r22,-28
        or      $r4,$r12,$r0
        ld.d    $r22,$r3,24
        addi.d  $r3,$r3,32
        jr      $r1

So, the conclusion is although the process maybe overflowed, the result isn't infected totally(Note don't confuse it with left + right, which will indeed terminate your running)

Back to the left+(right-left)/2, according to the assembly code produced by clang, it'll do (right-left) first, then the division /, and finally the add +.

Test Code 2:

int square() {
    int mid, left = 2147483647, right = 2147483647;
    mid =  left + (right - left) / 2 ;
    return mid;
}
square:                                 # @square
        push    rbp
        mov     rbp, rsp
        mov     dword ptr [rbp - 8], 2147483647
        mov     dword ptr [rbp - 12], 2147483647
        mov     eax, dword ptr [rbp - 8]
        mov     dword ptr [rbp - 16], eax # 4-byte Spill
        mov     eax, dword ptr [rbp - 12]
        sub     eax, dword ptr [rbp - 8] # "sub first"
        mov     ecx, 2
        cdq
        idiv    ecx # "division second"
        mov     ecx, eax
        mov     eax, dword ptr [rbp - 16] # 4-byte Reload
        add     eax, ecx # "add last"
        mov     dword ptr [rbp - 4], eax
        mov     eax, dword ptr [rbp - 4]
        pop     rbp
        ret


Disclaimer: the assembly code is from Compilers, the answer is just for fun.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.