GNU x86 assembler wrong jump encoding?

Question

I create a file named "test.s" with the content "jmp -2". Then I execute "as --32 -o test.o test.s" and "objdump -D test.o".
(I don't want to discuss here that a jump of -2 obviously leads to an endless loop.)
I get "e9 fa ff ff ff". Shouldn't it be "eb fe" instead? Because the 2's complement representation of -2 is 0xfe and for optimization purposes, a short jump should be used instead of a near jump, right?
Then I tried to use "echo "ebfe" | xxd -r -p - test2.bin" and used "hexdump test2.bin" to verify that the content of the file is indeed 0xebfe. But when I use "objdump -D -b binary -m i386 test2.bin", I get "jmp 0x0". Why?
For context, here is a short excerpt from the Intel Developer Manual:

The offset is relative to the start of the next instruction. Also you used an absolute address. If you want an endless short jump eb fe use jmp . If you want to jump back two bytes use jmp . - 2 but that won't be an endless loop. jmp -2 is a jump to absolute address -2. Since the assembler does not know absolute addresses, it will use a 4 byte offset and emit a relocation entry that the linker will fill in with the proper offset. You can add a -r option to your objdump command to see relocation entries. — Jester
– Jester, Commented Apr 5 at 17:33
Thanks. I overlooked that. But why do I get "jmp 0x0" in objdump when using "jmp ." in source? Why zero and not -2 or "."? — user1994405
– user1994405, Commented Apr 5 at 17:43
@user1994405: Re behavior of .: It's a documented feature of the GNU assembler: sourceware.org/binutils/docs/as/Dot.html The syntax of the assembler is determined by the assembler author, not by Intel, who are only defining the instruction set. — Nate Eldredge
– Nate Eldredge, Commented Apr 5 at 17:53
Use objdump -drwC (optionally with -Mintel) to show relocation entries. The 0 is a placeholder in an unlinked .o as Nate says. — Peter Cordes
– Peter Cordes, Commented Apr 5 at 18:05
@user1994405: Oh, you're talking about the second part of your question where you manually encoded jmp .; that doesn't have a relocation entry or placeholder. I was talking about the first part, where you used jmp -2 in the asm source. For that, objdump -drwC -Mintel shows 0: e9 00 00 00 00 jmp 0x5 1: R_X86_64_PC32 *ABS*-0x6 (That's all one line in the disassembly output. I assembled it for 64-bit mode, hence the X86_64_PC32 relocation type, program-counter-relative. *ABS* is how GNU Binutils talks about addresses that aren't relative to a section.) — Peter Cordes
– Peter Cordes, Commented Apr 5 at 20:39

user1994405 · Accepted Answer · 2025-04-05 18:31:45Z

-2

As @Nate Eldredge pointed out, putting assembly code into "as" and then "objdump" doesn't lead to the same code. "objdump" unfortunately messes with jump offsets. But my assumption that 0xebfe is an endless loop is correct.

It seems that objdump converts all relative jumps to absolute ones by displaying an offset relative to the start of the first instruction in the file. I verified that by calling "as" with a file that contains "jmp ." multiple times.

edited Apr 5 at 18:31

answered Apr 5 at 18:18

user1994405

712 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

21 Comments

Peter Cordes Apr 5 at 18:52

objdump" unfortunately messes with jump offsets - no it doesn't; that's not a useful description or useful way to think about what the tools are doing. It disassembles the machine code that's there, showing jump targets as (absolute) addresses; the relative encoding is there in the machine code hexdump. In a .o (object file that isn't linked yet), the machine code is just a placeholder (rel32=0) if the relative displacement couldn't be calculated at assemble time. e.g. with jmp -2, jump to absolute address 0xFFFFFFFE, the rel32 depends on where the linker puts the instruction.

user1994405 Apr 5 at 19:46

As I explained above, I'm developing an x86 assembler from scratch for learning purposes. So I need a disassembler to verify that my generated instructions are correct. So although I understand now how objdump handles these things, it is simply wrong from an objective point of view. 0xebfe is a jump to the relative address -2 and not a jump to some absolute address. The fact that objdump says "this is an absolute jump" in its output is simply wrong.

Peter Cordes Apr 5 at 20:29

The convention in asm source (which disassemblers create) is that the operand for a jump is the destination. It would sometimes be nice to have a disassembler that would produce jmp rel8 = -2 or something, especially for ISAs where branch-target decoding is less trivial (e.g. MIPS or ARM), but that's not what disassemblers do. objconv -fgasm foo.o foo.s (from agner.org/optimize) will create commented GAS source with branch targets that use autonumbered labels instead of numeric addresses.

Peter Cordes Apr 5 at 20:33

@ecm: Yeah, that's true for x86 because relative branches are the only kind encodeable for near jumps. Not true for MIPS, where j is section-absolute (replace the low 28 bits of the program counter with an immediate from the instruction), vs. b (like beq $zero, $zero, target) uses a 16-bit relative displacement. So in general across ISAs, tools for looking at machine code could do a better job of showing you how branches encode the target, as an optional output mode for people wanting to play with snippets of machine code and understand encodings. But that doesn't make objdump "wrong".

Nate Eldredge Apr 6 at 15:49

So do you have different mnemonics for the "load form" and "store form" of e.g. add reg, reg, or find some other way to resolve the ambiguity?

|

Collectives™ on Stack Overflow

GNU x86 assembler wrong jump encoding?

1 Answer 1

21 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

21 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related