How and why does my program change its input buffer? Using GDB to find out where. (Converting string to int in NASM x86 32bit)

Question

%macro mov_dd 2
    push eax
    push ebx
    mov dword eax, [%1]
    mov ebx, [eax]
    mov dword [%2], ebx
    pop ebx
    pop eax
%endmacro

section .data
    text db "Enter first Number: "
    len equ $-text

section .bss
    input resb 4
    ten_exp_num resb 4
    input_ptr resb 4



section .text
    global _start

_start:
    mov eax, 4
    mov ebx, 1
    mov ecx, text
    mov edx, len
    int 0x80            ; print(text)

    mov eax, 3
    xor ebx, ebx
    mov ecx, input
    mov edx, 16
    int 0x80            ; input = input()

    xor edi, edi
    xor eax, eax

    mov dword [input_ptr], input

    push ebp
    mov ebp, esp

    mov byte [ten_exp_num], 1

    call convert
    pop ebp

    mov [input], eax
    mov ecx, input
    mov eax, 4
    mov ebx, 1
    mov edx, 1
    int 0x80

    jmp end

convert:
    mov_dd input_ptr, input
    mov bl, [input]

    cmp bl, 10          ; 10 ≙ '\n'
    je return_func

    call to_int

    inc dword [input_ptr]

    jmp convert

to_int:
    sub bl, "0"
    
    movzx edi, bl

    imul edi, [ten_exp_num]

    add eax, edi
    call ten_expo
    ret

ten_expo:
    push eax
    
    mov eax, [ten_exp_num]
    imul eax, 10
    mov [ten_exp_num], eax

    pop eax
    ret

return_func:
    ret

end:
    mov eax, 1
    xor ebx, ebx
    int 0x80            ; return 0

I'm really new to assembly programming and currently trying to program a calculator, to get a basic understanding of it.
When im debugging with gdb and use e.g. 12\n as input, it works all fine until I'm in my convert loop at the third char (after the macro), which should be \n, but actually is just 0x00. I have completely no clue why that happens and already some stuff like changing the

inc dword [input_ptr]

to
inc byte [input_ptr]

but it didnt seem to help.

Can someone tell my why that happens and how to fix it?
(I know my code would convert in wrong order but I don't want to fix it before this works)

A simple calculator for Linux and Windows (CalcL32) is part of my assembly tutorial euroassembler.eu/eadoc/tut_eng.htm See also its more sofisticated version euroassembler.eu/eurotool/eurocalc.htm — vitsoft
– vitsoft, Commented Nov 16 at 13:57

Peter Cordes · Accepted Answer · 2025-11-16 13:21:45Z

2

I ran your code under GDB and set a watchpoint (watch (char[4])input) to detect which instruction changes your input: resb 4 buffer, for the same input you used, 12 enter

First the read system-call as expected, but then mov DWORD PTR ds:0x804a014,ebx changes it, from "12\n\0" to "2\n\0\n" That's from mov dword [%2], ebx in the macro expansion of mov_dd input_ptr, input.

IDK what the point of mov_dd input_ptr, input was supposed to be, but there's the culprit. I think you're loading 4 bytes pointed-to by the current input pointer and storing them back into the input buffer. (So you read past the end of your input buffer, into bytes that are part of your pointer, and are copying those bytes into input)

Update: you're always reading from the start of input with mov bl, [input], and were trying to shift the whole input over instead of just reading the byte pointed-to by input_ptr.

If you keep stepping, you'll see that the instead of shifting by 1 each time, you're shifting by an increasing amount as input_ptr gets farther into the buffer, farther away from the read position at [input + 0]. The second shift starts with

2 \n \0 \n   [bytes of ten_exp_num resb 4  where the 0xa = 1*10 came from]
     ^
     |
   input_ptr (after having been incremented twice)

Loading 4 bytes from there, the first byte is 0, which you then store over input. One way to fix this would be to only load a byte and copy it to the first byte of input, but then you might as well just have used that byte you loaded.

Or to keep this extremely clunky byte-shifting logic, keep input_ptr = input+1 the whole time so you do 4-byte loads and stores that overlap by 3 bytes, after loading the first byte.
The sane way to shift 4 bytes in memory would be shr dword [input], 8 or ror dword [input], 8, which avoids loading from past the end of the buffer. But neither of these scale easily to input buffers longer than 4 bytes. UINT32_MAX is 4294967295 which is 10 digits, so 11 bytes long including a newline, assuming the user submits input by pressing enter instead of ctrl-D on the terminal, or redirecting from a file or a pipe. (This is just a toy program so it's fine to make assumptions about the input being only digits followed by a newline in a toy program, ignoring the return value of the read system call.)

You're already incrementing a pointer, just movzx edx, byte [esi] or something to the byte pointed-to by that pointer.

Normally you'd want to keep pointers and integers in registers, only using memory for arbitrary-length stuff like strings. For example Convert string to int. x86 32 bit Assembler using Nasm uses the usual total = total*10 + digit-'0' algorithm, stopping on the first byte that isn't in the '0' .. '9' range. See also NASM Assembly convert input to integer? for more explanation about that algorithm and a more efficient check for being a digit. (Since you already want to sub digit, '0' for later use, do that first and check if the result is unsigned 0..9, which only takes one cmp/jna).

Pointers are 32-bit, you definitely want dword operand-size to increment whether that's on a pointer in memory or much more simply inc esi on a pointer in a register. byte operand-size would only operate on the low 8 bits, so would wrap around the low byte without propagating carry into the high bits of the pointer. e.g. would loop over a 256-byte aligned chunk of memory, e.g. going from 0x4000ff to 0x400000 instead of 0x400100 for a normal pointer increment. Equivalent if input: resb 4 is aligned by 4, which it should be at the start of the BSS, but wrong in general. (And slower: store-forwarding stall when you reload a dword after a byte store.)

edited Nov 16 at 13:21

answered Nov 16 at 12:20

Peter Cordes

377k50 gold badges741 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

user31887642 Nov 16 at 12:37

Thanks a lot for your Answer, I think I now see the Problem. I still want to try to get this running, even though it is slow and clunky, just to get a better understanding... Can you explain further what u meant here, "You're already incrementing a pointer, just movzx edx, byte [esi] or something to the byte pointed-to by that pointer.", I dont seem to understand, how that works...

user31887642 Nov 16 at 12:41

also, I understand why im getting random chars at the end of input, but why does do it that, Old value = "2\n\000\n" New value = "\000\nd" shouldnt it just increase the ptr by one byte?

Peter Cordes Nov 16 at 12:52

That's what your code should do, but it isn't. Instead you copy the dword pointed-to by input_ptr over the dword at input, so you're also shifting the bytes in memory, so input_ptr points to a different place relative to the original input. Just don't do that. To understand what's happening, have GDB display (char[4])input and/or display (char*)input_ptr so it prints them after every single-step.

user31887642 Nov 16 at 13:00

I see, lets say I wanted to solve it with that approach, why is \n not there after the second shift?

Peter Cordes Nov 16 at 13:11

Single-step your code with a debugger and check what 4 bytes get loaded into EBX by the previous instruction of that macro. You're reading outside the input buffer (into bytes that are part of ten_exp_num: resb 4), which I think is where the 0xa at the end comes from, since that's 1 * 10. If you keep stepping, you'll see that the instead of shifting by 1 each time, you're shifting by an increasing amount as input_ptr gets farther into the buffer, farther away from the read position at [input + 0].

Peter Cordes Nov 16 at 13:22

Updated my answer with those details.

Peter Cordes Nov 16 at 12:55

Instead of mov_dd input_ptr, input mov bl, [input], do something like mov ebx, [input_ptr] ; movzx ebx, byte [ebx] to load and deref the pointer. (Or simpler, just keep input_ptr in ESI instead of in memory.) So instead of loading from the start of your input buffer every time and shifting the bytes over, you just load the byte input_ptr is pointing to. Not modifying the input buffer also lets you make it larger much more conveniently. Your code depends on finding a newline so can't even handle 1234, let alone UINT_MAX 4294967295

Peter Cordes Nov 16 at 12:57

And BTW, if you did want to modify the input pointer, one less-bad way to do that is shr dword [input], 8 since x86 is little-endian. (Or ror by 8 or rol by 24, to keep the ASCII bytes around, although you won't know where in the buffer since that depends on length.) You could do that on a register instead of memory, like shr edx, 8 / movzx ebx, dl

Collectives™ on Stack Overflow

How and why does my program change its input buffer? Using GDB to find out where. (Converting string to int in NASM x86 32bit)

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related