Arithmetic identities and EFLAGS (emulate SUB using NOT and ADD?)

Question

Since −x = not(x)+1 which then implies a-b = a+not(b)+1, would then

sub rax, rcx

be equivalent to

mov temp, rcx
not temp
add rax, temp
add rax, 1

where temp is some register considered to be volatile?

In other words, does the latter affect EFLAGS in the exact same way? If not, how can it be forced to?

Although you seem to recognise it, given that it's part of your question, I do want to note that there's no reason to assume that the effect on EFLAGS would be the same. It being mathematically equivalent doesn't suggest that the effect on the state of the processor would be identical (ignoring temp). — Thomas Jager
– Thomas Jager, Commented Jun 5, 2020 at 15:43
Is sub broken in your CPU? - aside: perhaps you could avoid temp with not rcx; add rax, rcx; add rax, 1; not rcx;? — 500 - Internal Server Error
– 500 - Internal Server Error, Commented Jun 5, 2020 at 15:49
@500 No, and I suppose you meant to swap the latter two proposed instructions — user2039981
– user2039981, Commented Jun 5, 2020 at 15:51
@ThomasJager But theoretically speaking, it could be the case, which is the question. — user2039981
– user2039981, Commented Jun 5, 2020 at 15:52
@500-InternalServerError: not doesn't affect flags, unlike most other ALU instructions. (felixcloutier.com/x86/not#flags-affected) — Peter Cordes
– Peter Cordes, Commented Jun 5, 2020 at 15:59

Peter Cordes · Accepted Answer · 2025-10-23 00:11:07Z

8

Yes, that gets the same integer result in RAX.

In other words, does the latter affect EFLAGS in the exact same way?

No. ZF, SF, and PF only depend on the integer result, but CF and OF¹ depend on how you get there. x86's CF carry flag is a borrow output from subtraction. (Unlike some ISAs such as ARM, where subtraction sets the carry flag if there was no borrow.)

Trivial counterexample you could check in your head:
0 - 1 with sub sets CF=1. But your way clears CF.

mov temp, rcx        # no effect on FLAGS
not temp             # no effect on FLAGS, unlike most other x86 ALU instructions
add rax, ~1 = 0xFF..FE     # 0 + anything  clears CF
add rax, 1                 # 0xFE + 1 = 0xFF..FF = -1.  clears CF

(Fun fact: not doesn't affect FLAGS, unlike most other ALU instructions including neg. neg sets flags the same as sub from 0. A strange quirk of x86 history. https://www.felixcloutier.com/x86/not#flags-affected)

Footnote 1: AF, the half-carry flag (auxiliary) from the low to high nibble in the low byte, also depends on how you get there. You can't branch on it directly, and x86-64 removed the BCD instructions like aaa that read it, but it's still there in RFLAGS where you can read it with pushf / pop rax for example.

If not, how can it be forced to?

Use different instructions. The easiest and most efficient way to get the desired effect on EFLAGS would be to optimize it back to sub rax, rcx. That's why x86 has sub and sbb instructions. If that's what you want, use it.

Emulating `sub` including its FLAGS output

You definitely need to avoid something like add rax,1 as the last step. That would set CF only if the final result is zero, wrapping from ULONG_MAX = -1.

Doing x -= y as x += -y works for OF in most cases. (But not the most-negative number y=LONG_MIN (1UL<<63), where neg rcx would overflow).

But CF tells you about the 65-bit full result of 64 + 64-bit addition or subtraction. 64-bit negation isn't sufficient: x += -y doesn't always set CF opposite of what x -= y would.

Real ALUs do the +1 with carry-in, not as a separate step

Hardware ALUs normally use a binary Adder–subtractor which only does a single operation that propagates carry through all the bits, not a separate add before or after incrementing. That avoids any need to check and combine carry-out (and signed-overflow) results from two separate operations, and requires many fewer gates to implement and lower critical-path gate depth.

(Negating the most-negative number overflows; see a previous version of this answer for a failed attempt at emulating with neg/add/cmc which might work for inputs other than LONG_MIN.)

The trick is to feed in a +1 to the carry input of the low bit for subtraction. (Or flip the existing carry input for sbb). The pre-processing of B is just conditionally flipping its bits (with XOR gates) to get ~B as an input to A + ~B + 1 = A - B done with a single add-with-carry.

This construction also makes the overflow and carry outputs from the ALU useful. Signed Overflow is directly usable. The carry output from the ALU is set if there was no borrow, clear if there was. So x86 needs to invert that ALU output to get CF, unlike for addition where it uses it directly. (ARM can always just use that ALU output directly since it has opposite semantics for CF from subs/cmp.)

For example with small positive inputs like 0x05 + (~0x06) + 1 = 0x01 : the binary addition wrapped past zero to a small unsigned value, i.e. it had a carry-out. But 5 - 6 = -1 does have a borrow. Conversely, 5 - 3 = +2 has no borrow, because 0x05 + (~0x03) + 1 = 0xFE = -2 doesn't wrap.

; Emulate  SUB RAX, RCX.   temp can be RDX for example
   mov  temp, rcx
   not  temp
   stc             ; CF = 1
   adc  rax, temp  ; rax += ~rcx + 1
   cmc             ; CF = !CF  like x86 sub does
; all flags except AF set like  sub rax, rcx

To perfectly emulate sub we'd also need to flip the AF bit (nibble-carry from bit #3 to bit #4). If you care, perhaps pushf / xor qword [rsp], 0x11 (flip CF and AF) / popf, but that's slow. lahf/xor ah, 0x11/sahf clears OF, which is unfortunately outside the low 8 bits of FLAGS so not restored by sahf.

Bonus: emulating `sbb` is convenient due to no-FLAGS `not`

; Emulate  SBB RAX, RCX.   Using RDX as a temporary
   mov  rdx, rcx
   not  rdx        ; leaves CF unchanged
   cmc             ; CF = !1.  no borrow means carry-in=1 like for SUB
   adc  rax, rdx   ; rax += ~rcx + CF
   cmc             ; CF = !CF
; all flags except AF set like  sbb rax, rcx

Steve Morse says he missed not as a FLAGS-affecting instruction when drafting the ISA on paper. So that's what the hardware designers implemented. If original 8086's ALU could do a true sbb operation (with XOR gates to conditionally flip one of the inputs to the adders), not emulated with a microcode not, this is probably just an accident. But one can imagine why the HW architects maybe thought it was intended and didn't double-check the spec with Steve. (Apparently the 8086 ISA was fully designed on paper first, then implemented.)

answered Jun 5, 2020 at 16:24

Peter Cordes

377k50 gold badges741 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

21 Comments

Dan Over a year ago

Hi Peter, When you do a 0-1 in binary, where are you borrowing from? there i no bit/number on the left side of 0 so where is the "borrowed" 1 coming from? is this where a flags gets set to indicate a non existent borrow? I found this post which explains my question but the answers are not sufficient: stackoverflow.com/questions/46570941/…

Peter Cordes Over a year ago

@Dan: From the next higher bit position. Just like when you add 1+1 in binary, there's not enough room to store the 10 result, so the result for that bit is 0 with a carry-out of 1. (en.wikipedia.org/wiki/Adder_(electronics)#Full_adder). 0-1 = 1 with a borrow output of 1. en.wikipedia.org/wiki/Subtractor#Full_subtractor

Dan Over a year ago

But the next higher bit position is 0 in this case. Assume we have an 8 bit cpu then run 0b00000000 - 0b00000001. first inputs bits are all 0s, nothing to borrow from?

Peter Cordes Over a year ago

@Dan: Right, 0b10 - 0b01 = 0b01, including a borrow into the 2nd bit from the 0-1 in the low bit. But there's no further borrow out of that bit, so the borrow output of the whole 2-bit subtraction (out of the most-significant full-subtractor) is 0. In 0 - 1, there is a borrow output from the whole thing. i.e. the left hand operand was unsigned-below the right-hand operand. i.e. 0 - 1 = -1 doesn't have anything to borrow from, so the result is negative. i.e. 0b11 (no signed overflow, just unsigned carry).

Peter Cordes Over a year ago

@Dan: Think through the logic of a 1-bit full subtractor (en.wikipedia.org/wiki/Subtractor#Full_subtractor) and how that works on its own. Keep in mind that borrow (like carry) propagates from LSB to MSB; it doesn't matter what's there to borrow from. Once you understand the details of what actually happens, you can start thinking about how to assign mathematical meaning to those bits for unsigned or signed 2's complement interpretations of those bits. (A binary sign/magnitude would work differently).

|

Nate Eldredge · Accepted Answer · 2020-06-05 16:13:01Z

No, they're not equivalent. For instance if rax = 1 and rcx = 3, then sub rax, rcx will set the carry flag, because you are subtracting a larger number from a smaller one. But in your second sequence of instructions, following add rax, temp, rax will contain -3 (i.e. 0xfffffffffffffffd), and adding 1 to -3 does not cause a carry. So after your second sequence of instructions, the carry flag would be cleared.

I do not know of any simple way to exactly emulate the behavior of sub including its effect on flags (other than by using cmp, but that's cheating because it's really just sub under the hood). In principle, you could write a long sequence of instructions that manually did all the same tests that sub does internally (referring to its precise description in the instruction set manual), and sets the flags at the end using sahf or popf of the like.

This would be a lot of work, especially if you are not going to use cmp, and I am not going to go through it for this answer. Especially because I also can't think of any reason why one would need to do it, except as a fairly pointless exercise.

Collectives™ on Stack Overflow

Arithmetic identities and EFLAGS (emulate SUB using NOT and ADD?)

2 Answers 2

Emulating `sub` including its FLAGS output

Real ALUs do the +1 with carry-in, not as a separate step

Bonus: emulating `sbb` is convenient due to no-FLAGS `not`

21 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Emulating sub including its FLAGS output

Real ALUs do the +1 with carry-in, not as a separate step

Bonus: emulating sbb is convenient due to no-FLAGS not

21 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Emulating `sub` including its FLAGS output

Bonus: emulating `sbb` is convenient due to no-FLAGS `not`