Why is the Overflow-Flag only set when single shifts are used?

Question

In the x86 intel reference manual it says:

"The overflow flag is set only if the single-shift forms of the instruction are used. [...]"

But when I have the following scenario:

xor eax, eax
mov al, 0b11000000
shl al, 2
;content of al: 00000000

Here the high bit of the answer is not the same as the result of the carry-out, namely cf = 1, and the overflow flag is not set.

I don't get why this is the correct behavior. Why is the overflow flag set only when single shifts are used?

I don't understand your question. The example you showed demonstrates that the overflow flag behaves as indicated in the manual: it is not changed. — fuz
– fuz, Commented Apr 6, 2022 at 17:41
@fuz I don't get why this is the correct behavior. Why is the over flow flag only set when single shifts are used. — stht55
– stht55, Commented Apr 6, 2022 at 18:09
Updated my answer: my guess was wrong; CPUs don't always set OF=0 for counts > 1. (And certainly not based on opcode rather than the count value). And added some 8086 history to see if I could find anything in Stephen Morse's book, but didn't really find much. — Peter Cordes
– Peter Cordes, Commented Apr 7, 2022 at 2:51
@fuz: It's "undefined" not "unchanged". Real CPUs always update it to something independent of its previous value for any non-zero count. Dependent only on the integer input and count. (Of course, leaving it unmodified is possible, but would create a data dependency on the old FLAGS value, which shl reg, imm8 otherwise doesn't need.) In my experiments for my answer, I didn't actually try running the same shift instruction with old OF set either way, but definitely see OF change from 0 to 1 in the cases that produce OF=1.) — Peter Cordes
– Peter Cordes, Commented Apr 7, 2022 at 2:54

Peter Cordes · Accepted Answer · 2022-04-07 02:50:47Z

OF=undefined for shift counts other than 1; results in practice depend on your CPU. See below for my theory of how it's set on my Intel CPU.

This design decision makes some sense, letting the hardware be slightly simpler.

Detecting 2's complement signed overflow properly would require checking that all bits shifted out matched the new MSB. That's different from just checking the final bit shifted out like it does now with CF, so would require some internal state for a one-at-a-time shifter like original 8086 used.

That's perhaps what Stephen Morse (architect of the 8086 ISA) was thinking when he made the design choice for 8086. His book, the 8086 Primer, is available for free on his web site, and confirms (pg96) that 8086 has undefined OF for the variable-count opcode. (For 8086, apparently that includes shl al, cl with CL=1, unlike how Intel currently documents.) The section about shift instructions and what they're for (pg64-66) doesn't mention OF, only CF.

Having to check all the bits shifted out might also make a barrel shifter more expensive, but Morse was less likely to be thinking of that.

IDK why Morse didn't define OF as always being set in some specific way, perhaps according to CF not matching the current MSB, which is probably not useful but still would be meaningful for counts of 1. The ALU is already required to get the last bit shifted out for CF. Perhaps that's because 8086 didn't define anything for OF in the variable-count opcode, even if the count happened to be 1.

Note that some CPUs in practice produce OF=1 for some cases with a count greater than 1. For example, my i7-6700k Skylake does with 0x7f << 2. The documentation says

OF flag is affected only for 1-bit shifts (see “Description” above); otherwise, it is undefined.

Undefined is not the opposite of affected; that would be "unaffected". It's always set to some value, they just don't document how the CPU picks 0 vs. 1.

Actually unmodified would force reading and merging with the old FLAGS value for immediate shift counts other than 0 on modern CPUs, like for variable counts in case it's 0, so it's good that it's not specified that way. (shl reg, cl is 3 uops on Sandybridge-family because of the need to leave FLAGS unmodified in case CL&31 == 0). So that would be an unwanted data dependency, unlike now where shifts write all flags unless the count is 0.

I tested my CPU with this NASM program

_start:
    mov cl, 7
    mov dl, 0x7f       ; GDB   set $dl = 0xc0  or whatever after this
.loop:
    mov eax, edx
    shl al, cl
    dec cl             ; set a breakpoint here to look at EFLAGS after every continue
    jnz .loop
;; fall off the end; I'm only single-stepping this in GDB anyway

assemble + link into a static executable with nasm+ld, run with GDB and use layout reg / layout next. Use starti and si.

My Skylake CPU does set OF=1 for shl al,cl with AL=0x7f CL=2 (or 1 or any non-zero count). Or for AL=0x80. But never sets it for AL=0x3 for any count, or for AL=0xc0 (0b1100_0000)

My current guess to explain the behaviour is that OF is set as if it was a shift by 1,
i.e. if OF = (input[MSB] != input[MSB-1]) of the input bits.

This makes sense; it gives the correct result in the case where the paper spec requires a specific result, and it's cheap to implement. (The OF output would still have to come from different bits depending on the operand-size.)

Of course, other microarchitectures from other vendors can be different. As could pure-software x86 emulators which still comply with the on-paper spec.

My AMD Ryzen 1700 seems to set OF based on the final bit shift. It produces OF=1 for AL=3 and CL=6. Similarly for AL=0xc0 and CL=2. Conversely AL=0x7f and CL=2 gives no overflow.
@Jester: Interesting that it's different from Intel Skylake. Just as easy for HW to implement, though; it already has to get the last bit shifted out for CF, which is always set that way even for count > 1. So it's just OF = CF ^ SF on AMD.

Collectives™ on Stack Overflow

Why is the Overflow-Flag only set when single shifts are used?

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related