Smallest executable program (x86-64 Linux)

Question

I recently came across this post describing the smallest possible ELF executable for Linux, however the post was written for 32 bit and I was unable to get the final version to compile on my machine.

This brings me to the question: what's the smallest x86-64 ELF executable it's possible to write that runs without error?

Abusing or violating the ELF specification is ok as long as current Linux kernels in practice will run it, as with the last couple versions of the MuppetLabs 32-bit teensy ELF executable article.

What machine do you have? Windows subsystem for Linux (which doesn't support 32-bit executable at all)? Or a proper Linux kernel built without IA-32 compat? What do you mean you couldn't get the final version to even compile? Surely you got a binary file, but couldn't run it? (Anyway, I know your question isn't about that, but if you couldn't even compile the 32-bit version, you probably won't be able to use NASM's flat-binary output to create a 64-bit executable with code packed into the ELF headers either.) — Peter Cordes
– Peter Cordes, Commented Nov 19, 2018 at 21:10
Can you use 32-bit int 0x80 system calls in your 64-bit executable? If so, your probably don't need to change much. I know there's some overlap of ELF header fields being interpreted as part of the machine code, so some change might be needed for ELF64. — Peter Cordes
– Peter Cordes, Commented Nov 19, 2018 at 21:13
For 64 bit mode, you basically need to recreate the entire program as both the machine code and the layout of the ELF header is quite different. While this is a nice exercise for an experienced programmer, I'm not sure if you are going to get an answer to your question within the scope of this site. — fuz
– fuz, Commented Nov 19, 2018 at 21:33
I'm voting to close this question as off-topic because code golf questions are off-topic on StackOverflow. — Ross Ridge
– Ross Ridge, Commented Nov 19, 2018 at 22:32
This is not just a "code golf" question IMO; it has practical value as well. I came here because I was interested in writing a tiny assembly program by hand, and was looking for a starting point. — Brandon
– Brandon, Commented Oct 23, 2022 at 1:05

Matteo Italia · Accepted Answer · 2018-11-19 22:25:49Z

18

Starting from an answer of mine about the "real" entrypoint of an ELF executable on Linux and "raw" syscalls, we can strip it down to

bits 64
global _start
_start:
   mov di,42        ; only the low byte of the exit code is kept,
                    ; so we can use di instead of the full edi/rdi
   xor eax,eax
   mov al,60        ; shorter than mov eax,60
   syscall          ; perform the syscall

I don't think you can get it to be any smaller without going out of specs - in particular, the psABI doesn't guarantee anything about the state of eax. This gets assembled to precisely 10 bytes (as opposed to the 7 bytes of the 32 bit payload):

66 bf 2a 00 31 c0 b0 3c 0f 05

The straightforward way (assemble with nasm, link with ld) produces me a 352 bytes executable.

The first "real" transformation he does is building the ELF "by hand"; doing this (with some modifications, as the ELF header for x86_64 is a bit bigger)

bits 64
            org 0x08048000

ehdr:                                           ; Elf64_Ehdr
            db  0x7F, "ELF", 2, 1, 1, 0         ;   e_ident
    times 8 db  0
            dw  2                               ;   e_type
            dw  62                              ;   e_machine
            dd  1                               ;   e_version
            dq  _start                          ;   e_entry
            dq  phdr - $$                       ;   e_phoff
            dq  0                               ;   e_shoff
            dd  0                               ;   e_flags
            dw  ehdrsize                        ;   e_ehsize
            dw  phdrsize                        ;   e_phentsize
            dw  1                               ;   e_phnum
            dw  0                               ;   e_shentsize
            dw  0                               ;   e_shnum
            dw  0                               ;   e_shstrndx

ehdrsize    equ $ - ehdr

phdr:                                           ; Elf64_Phdr
            dd  1                               ;   p_type
            dd  5                               ;   p_flags
            dq  0                               ;   p_offset
            dq  $$                              ;   p_vaddr
            dq  $$                              ;   p_paddr
            dq  filesize                        ;   p_filesz
            dq  filesize                        ;   p_memsz
            dq  0x1000                          ;   p_align

phdrsize    equ     $ - phdr

_start:
   mov di,42        ; only the low byte of the exit code is kept,
                    ; so we can use di instead of the full edi/rdi
   xor eax,eax
   mov al,60        ; shorter than mov eax,60
   syscall          ; perform the syscall

filesize      equ     $ - $$

we get down to 130 bytes. This is a tad bigger than the 91 bytes executable, but it comes from the fact that several fields become 64 bits instead of 32.

We can then apply some tricks similar to his; the partial overlap of phdr and ehdr can be done, although the order of fields in phdr is different, and we have to overlap p_flags with e_shnum (which however should be ignored due to e_shentsize being 0).

Moving the code inside the header is slightly more difficult, as it's 3 bytes larger, but that part of header is just as big as in the 32 bit case. We overcome this by starting 2 bytes earlier, overwriting the padding byte (ok) and the ABI version field (not ok, but still works).

So, we reach:

bits 64
            org 0x08048000

ehdr:                                           ; Elf64_Ehdr
            db  0x7F, "ELF", 2, 1,              ;   e_ident
_start:
            mov di,42        ; only the low byte of the exit code is kept,
                            ; so we can use di instead of the full edi/rdi
            xor eax,eax
            mov al,60        ; shorter than mov eax,60
            syscall          ; perform the syscall
            dw  2                               ;   e_type
            dw  62                              ;   e_machine
            dd  1                               ;   e_version
            dq  _start                          ;   e_entry
            dq  phdr - $$                       ;   e_phoff
            dq  0                               ;   e_shoff
            dd  0                               ;   e_flags
            dw  ehdrsize                        ;   e_ehsize
            dw  phdrsize                        ;   e_phentsize
phdr:                                           ; Elf64_Phdr
            dw  1                               ;   e_phnum         p_type
            dw  0                               ;   e_shentsize
            dw  5                               ;   e_shnum         p_flags
            dw  0                               ;   e_shstrndx
ehdrsize    equ $ - ehdr
            dq  0                               ;   p_offset
            dq  $$                              ;   p_vaddr
            dq  $$                              ;   p_paddr
            dq  filesize                        ;   p_filesz
            dq  filesize                        ;   p_memsz
            dq  0x1000                          ;   p_align

phdrsize    equ     $ - phdr
filesize    equ     $ - $$

which is 112 bytes long.

Here I stop for the moment, as I don't have much time for this right now. You now have the basic layout with the relevant modifications for 64 bit, so you just have to experiment with more audacious overlaps

answered Nov 19, 2018 at 22:25

Matteo Italia

128k18 gold badges219 silver badges313 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Peter Cordes Over a year ago

If you're golfing for code-size and you still want to _exit(42) instead of xor edi,edi like a normal person, you'd use push 42/pop rdi (3 bytes) instead of a 4-byte 66 mov-di imm16. And then a 3-byte lea eax, [rdi - 42 + 60] or another push/pop. Tips for golfing in x86/x64 machine code. Of course in practice Linux does zero all the registers before process startup. Depending on your golfing rules, you might take advantage. (codegolf.SE only requires that code work on at least one implementation, not necessarily all.)

Peter Cordes Over a year ago

To set only the low byte, another option is mov al,42 (2 bytes) /xchg eax,edi (1 byte).

Matteo Italia Over a year ago

@PeterCordes: argh the usual push/pop trick, I keep forgetting it... probably it's because I usually golf in 16 bit x86, where they aren't as useful (except for segment registers). _exit(42) is there to match the original, otherwise I would have just made it exit with whatever happened to be in rdi :-D. Unfortunately, as this is not a "regular" code-golf, there aren't really well-defined rules...

sivizius Over a year ago

I am at 9 Bytes with use64; xor edi, edi; mov al, 42; xchg eax, edi; mov al, 60; syscall?

Matteo Italia Over a year ago

@mercury0114: the code itself is 12 bytes, the rest is various headers, the symbol table, the definition of other standard executable sections and stuff like that. Assembling your code with nasm -felf and linking it with ld -m elf_i386 I get 484 bytes, doing strip -s over the resulting binary gets down to 248 (you can get an idea of the content before/after using objdump -x -D).

|

Peter Cordes · Accepted Answer · 2024-11-26 19:43:08Z

Most articles out there give up on ld and resort to hand-crafting the ELF headers waaay too soon, including the amazing answer from Matteo Italia.

I've discovered you can get to the standard ELF header + program code 120-ish bytes limit using only standard tools, no need to insert the ELF header in your ASM

Standard assembly code, with a few tricks:

; tiny.asm
BITS 64
SECTION .text align=1
GLOBAL _start
_start:
    ; _exit(42)
    ; all registers zeroed by Linux ABI at start, so safe to use al/dil
    mov       al, 60  ; Select the _exit syscall (60 in Linux ABI)
    mov      dil, 42  ; Set the exit code argument for _exit
    syscall           ; Perform the selected syscall

Remarks:

Using al/dil instead of the common eax/edi or the naive rax/rdi, for a 7-byte code payload. This is fine as Linux ABI guarantees all registers to be zero on program start.
align=1 so ld with its default linker script for a non-PIE can pick 0x400078 as the program entry point address, putting the payload right after the ELF header. As explained by @ecm, NASM's default is align=16, which makes ld use 8 bytes of padding to get to 0x400080.

And now some fine-tuned command-line arguments:

nasm -f elf64 tiny.asm &&
ld -s -no-pie -z noseparate-code tiny.o -o tiny

Results:

$ wc -c tiny && ./tiny; echo $?
336 tiny
42

That's already better than the 352 bytes Matteo had in his last ld attempt. And the code payload only accounts for 3 out of the 18 bytes saved.

But the payload is not the point here. My goal is to get rid of all section headers, so we get to the 120+payload size which is the absolute minimum before manually fiddling with the ELF header.

Given our 7-byte payload, we aim for a 127-byte binary, breaking the ~300 bytes barrier.

$ strip --strip-section-headers tiny && wc -c tiny && ./tiny; echo $?
127 tiny
42

A 62% reduction with a single strip, and goal achieved!

$ readelf -Wa tiny
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400078
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         1
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no section groups in this file.

Program Headers:
 Type  Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
 LOAD  0x000000 0x0000000000400000 0x0000000000400000 0x00007f 0x00007f R E 0x1000

There is no dynamic section in this file.

There are no relocations in this file.
No processor specific unwind information to decode

Dynamic symbol information is not available for displaying symbols.

No version information found in this file.

This is a result Matteo and most articles only achieved by pasting the ELF header in ASM and editing the loader address by hand, but now we tamed nasm, ld and strip to do it automatically for us.

And, for completeness, the 4-byte payload true clone that yields impressive 124 bytes in just 5 lines, which I believe is the smallest possible size before non-standard approaches like overlapping headers and embedding the payload in it:

SECTION .text align=1
GLOBAL _start
_start:
    mov       al, 60
    syscall

nasm -f elf64 tiny.asm &&
ld -s -no-pie -z noseparate-code tiny.o -o tiny &&
strip --strip-section-headers tiny && wc -c tiny && ./tiny; echo $?

124 tiny
0

A tiny executable and a tiny source!

Nice Result! I tried to reproduce it and got stuck at 'strip'. It seems like I don't have the '--strip-section-headers' available. Any hint?
@bennib22 : It is a fairly recent addition to strip, added in binutils 2.41 released in July 2023, according to its release notes.

Peter Cordes · Accepted Answer · 2024-01-31 06:12:34Z

3

Updated Answer

After seeing the tricks used in @Matteo Italia's answer, I found it's possible to reach 112 bytes since we can not only hide the string but also the code in the EFL header.

Explanations: The key idea is hiding everthing to the header, including string "Hello World!\n" and the code to print the string. We should first test what part of the header is modifiable (aka modify the value and the program can still be executed). Then, we hide our data and code in header as following code shows: (compile with command nasm -f bin ./x.asm)

This source code is based on @Matteo Italia's answer but completes the part he didn't show, of printing Hello World as well as exiting. There doesn't seem to be a way to make it any shorter; the kernel requires the file to be big enough to contain the ELF headers.
This version has some nop instructions in other space that's available for use inside / between the ELF headers which we can't avoid. We still have space to waste in p_paddr and p_align.

bits 64
            org 0x08048000

ehdr:                                           ;   Elf64_Ehdr
            db  0x7F, "ELF",                    ;   e_ident
_start:
            mov dl, 13
            mov esi,STR
            pop rax
            syscall
            jmp _S0
            dw  2                               ;   e_type
            dw  62                              ;   e_machine
            dd  0xff                            ;   e_version
            dq  _start                          ;   e_entry
            dq  phdr - $$                       ;   e_phoff
STR:
            db "Hello Wo"                       ;   e_shoff
            db "rld!"                           ;   e_flags
            dw  0x0a                            ;   e_ehsize, ther place where we hide the next line symbol
            dw  phdrsize                        ;   e_phentsize
phdr:                                           ;   Elf64_Phdr
            dw  1                               ;   e_phnum         p_type
            dw  0                               ;   e_shentsize
            dw  5                               ;   e_shnum         p_flags
            dw  0                               ;   e_shstrndx
ehdrsize    equ $ - ehdr
            dq  0                               ;   p_offset
            dq  $$                              ;   p_vaddr
_S0:
            nop                  ; unused space for more code
            nop
            nop
            nop
            nop                                 
            nop                                 
            jmp _S1                             ;   p_paddr, These 8 bytes belong to p_paddr, I nop them to show we can add some asm code here
            dq  filesize                        ;   p_filesz
            dq  filesize                        ;   p_memsz
_S1:
            mov eax,60 ; p_align[0:5]
            syscall    ; p_align[6:7]
            nop        ; p_align[7:8]

phdrsize    equ     $ - phdr
filesize    equ     $ - $$

Original Post:

I have a 129-byte x64 "Hello World!".

Step1. Compile the following asm code with nasm -f bin hw.asm

; hello_world.asm
  BITS 64
  org 0x400000

  ehdr:           ; Elf64_Ehdr
    db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
    times 8 db 0
    dw  2         ; e_type
    dw  0x3e      ; e_machine
    dd  1         ; e_version
    dq  _start    ; e_entry
    dq  phdr - $$ ; e_phoff
    dq  0         ; e_shoff
    dd  0         ; e_flags
    dw  ehdrsize  ; e_ehsize
    dw  phdrsize  ; e_phentsize
  phdr:           ; Elf64_Phdr
    dd  1         ; e_phnum      ; p_type
                  ; e_shentsize
    dd  5         ; e_shnum      ; p_flags
                  ; e_shstrndx
  ehdrsize  equ  $ - ehdr
    dq  0         ; p_offset
    dq  $$        ; p_vaddr
    dq  $$        ; p_paddr
    dq  filesize  ; p_filesz
    dq  filesize  ; p_memsz
    dq  0x1000    ; p_align
  phdrsize  equ  $ - phdr
  
  _start:
    ; write "Hello World!" to stdout
    pop rax
    mov dl, 60
    mov esi, hello
    syscall
    syscall

  hello: db "Hello World!", 10 ; 10 is the ASCII code for newline

  filesize  equ  $ - $$

Step2. Modify it with following python script

from pwn import *
context.log_level='debug'
context.arch='amd64'
context.terminal = ['tmux', 'splitw', '-h', '-F' '#{pane_pid}', '-P']
with open('./hw','rb') as f:
    pro = f.read()
print(len(pro))
pro = list(pro)
cut = 0x68
pro[0x18]  = cut
pro[0x74]  = 0x7c-(0x70-cut)
pro = pro[:cut]+pro[0x70:]
print(pro)
x = b''
for _ in pro:
    x+=_.to_bytes(1,'little')
with open("X",'wb') as f:
    f.write(x)

You should a 129-byte "Hello World".

[18:19:02] n132 :: xps  ➜  /tmp » strace ./X
execve("./X", ["./X"], 0x7fffba3db670 /* 72 vars */) = 0
write(0, "Hello World!\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 60Hello World!
) = 60
exit(0)                                 = ?
+++ exited with 0 +++
[18:19:04] n132 :: xps  ➜  /tmp » ./X
Hello World!
[18:19:11] n132 :: xps  ➜  /tmp » ls -la ./X
-rwxrwxr-x 1 n132 n132 129 Jan 29 18:18 ./X

edited Jan 31, 2024 at 6:12

Peter Cordes

377k50 gold badges741 silver badges1k bronze badges

answered Jan 30, 2024 at 1:40

n132

313 bronze badges

15 Comments

Peter Cordes Over a year ago

What changes does the Python code make, and why do that with Python instead of NASM macros + directives? Clever trick, though to write with a length of 60 = __NR_exit including trailing garbage so the return value is the call number for the next syscall. And to use rax = argc as __NR_write. This also depends on stdin (fd 0) being a read-write FD that's open on the terminal, since you write(0, hello, 60).

Peter Cordes Over a year ago

This program doesn't respect ./hello > /dev/null, but breaks if you close or redirect stdin. Which is fine, it still works in a normal terminal, but worth at least a mention in the comments to document that it's intentionally writing stdin to save bytes (because Linux initializes register values to 0 in a freshly-execed process.)

n132 Over a year ago

You are right. I used stdin to save bytes as well as truncate the ELF header. I don't know how to use nasm to do that so I just used Python and find at most we can ignore the last 8 bytes in the header. Also, we can hide the string "Hellow World!\n" in the ELF header. I got a 118-byte Hello World by utilizing this skill. (For this case I have to set RDX for SYS_Write and RAX for SYS_exit correctly since there are non-zero bytes after "Hello World\n". ) It's still not hard to make it smaller than 118 bytes.

Peter Cordes Over a year ago

You can't truncate or overwrite bytes you've already emitted with NASM, so just comment out the dq 0x1000 ; p_align line to not emit those 8 bytes in the first place. (Leaving it there commented out is a good way to document what you're doing, along with other comments. Unlike your Python code full of magic numbers with no comments.)

Peter Cordes Over a year ago

since there are non-zero bytes after "Hello World\n" - can you put the code inside the ELF header instead of the string? The string is 13 bytes, the machine code is 12. Or do the bytes need to have certain values, and re-ordering your asm instructions can't achieve that?

|

Collectives™ on Stack Overflow

Smallest executable program (x86-64 Linux)

3 Answers 3

8 Comments

2 Comments

15 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

2 Comments

15 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related