I know each segment in the executable file need to be aligned, before loaded into ram there were aligned to minimum page size of 4096 bytes and if there not enough fit within 4096 bytes they span across mulitple pages in virtual memory and the kernel keeps these details by maintaing page tables -That's relate to paging mechanism and how ram is accessed by a process.But im not sure what is happening here.
While I was playing the pico-ctf packer reverse engineering challenge, the executable provided in that challenge was a Linux ELF file. It was packed with UPX and statically linked. When I disassembled the binary file, I noticed some instructions altering the stack pointer (rsp) in a way that subtracts 0x1000 (4096 bytes in decimal) from rsp. However, I wasn't sure about this, so I Googled it. The results I found explained that stack alignment is typically done to a 16-byte or 32-byte boundary for performance, but nothing related to this exact behavior.
For example:
To align to a 16-byte boundary, the mask looks something like this: 0xFFFFFFFFFFFFFFF0
But I noticed a mask where the last three nibbles are zeros: 0xFFFFFFFFFFFFF000
0000000000401d65 <main>:
401d65: f3 0f 1e fa endbr64
401d69: 55 push rbp
401d6a: 48 89 e5 mov rbp,rsp
401d6d: 53 push rbx
401d6e: 48 81 ec 98 00 00 00 sub rsp,0x98
401d75: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
401d7c: 00 00
401d7e: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
401d82: 31 c0 xor eax,eax
401d84: 48 89 e0 mov rax,rsp
401d87: 48 89 c3 mov rbx,rax
401d8a: 48 c7 85 68 ff ff ff mov QWORD PTR [rbp-0x98],0x64
401d91: 64 00 00 00
401d95: 48 8b 85 68 ff ff ff mov rax,QWORD PTR [rbp-0x98]
401d9c: 48 89 c2 mov rdx,rax
401d9f: 48 83 ea 01 sub rdx,0x1
401da3: 48 89 95 70 ff ff ff mov QWORD PTR [rbp-0x90],rdx
---
401daa: 49 89 c0 mov r8,rax
401dad: 41 b9 00 00 00 00 mov r9d,0x0
401db3: 48 89 c6 mov rsi,rax
401db6: bf 00 00 00 00 mov edi,0x0
401dbb: ba 10 00 00 00 mov edx,0x10
401dc0: 48 83 ea 01 sub rdx,0x1
401dc4: 48 01 d0 add rax,rdx
401dc7: be 10 00 00 00 mov esi,0x10
401dcc: ba 00 00 00 00 mov edx,0x0
401dd1: 48 f7 f6 div rsi
401dd4: 48 6b c0 10 imul rax,rax,0x10
401dd8: 48 89 c2 mov rdx,rax
401ddb: 48 81 e2 00 f0 ff ff and rdx,0xfffffffffffff000
401de2: 48 89 e1 mov rcx,rsp
401de5: 48 29 d1 sub rcx,rdx
401de8: 48 89 ca mov rdx,rcx
401deb: 48 39 d4 cmp rsp,rdx
401dee: 74 12 je 401e02 <main+0x9d>
401df0: 48 81 ec 00 10 00 00 sub rsp,0x1000
401df7: 48 83 8c 24 f8 0f 00 or QWORD PTR [rsp+0xff8],0x0
401dfe: 00 00
401e00: eb e9 jmp 401deb <main+0x86>
401e02: 48 89 c2 mov rdx,rax
401e05: 81 e2 ff 0f 00 00 and edx,0xfff
401e0b: 48 29 d4 sub rsp,rdx
401e0e: 48 89 c2 mov rdx,rax
401e11: 81 e2 ff 0f 00 00 and edx,0xfff
401e17: 48 85 d2 test rdx,rdx
401e1a: 74 10 je 401e2c <main+0xc7>
401e1c: 25 ff 0f 00 00 and eax,0xfff
401e21: 48 83 e8 08 sub rax,0x8
401e25: 48 01 e0 add rax,rsp
401e28: 48 83 08 00 or QWORD PTR [rax],0x0
401e2c: 48 89 e0 mov rax,rsp
401e2f: 48 83 c0 00 add rax,0x0
401e33: 48 89 85 78 ff ff ff mov QWORD PTR [rbp-0x88],rax
---
401e3a: 48 b8 37 30 36 39 36 movabs rax,0x6636333639363037
401e41: 33 36 66
401e44: 48 ba 34 33 35 34 34 movabs rdx,0x6237363434353334
401e4b: 36 37 62
401e4e: 48 89 45 80 mov QWORD PTR [rbp-0x80],rax
401e52: 48 89 55 88 mov QWORD PTR [rbp-0x78],rdx
401e56: 48 b8 35 35 33 39 35 movabs rax,0x6635383539333535
401e5d: 38 35 66
401e60: 48 ba 35 35 36 65 35 movabs rdx,0x3433303565363535
401e67: 30 33 34
401e6a: 48 89 45 90 mov QWORD PTR [rbp-0x70],rax
401e6e: 48 89 55 98 mov QWORD PTR [rbp-0x68],rdx
401e72: 48 b8 36 33 36 62 33 movabs rax,0x6534313362363336
401e79: 31 34 65
401e7c: 48 ba 33 36 35 66 34 movabs rdx,0x3133323466353633
401e83: 32 33 31
401e86: 48 89 45 a0 mov QWORD PTR [rbp-0x60],rax
401e8a: 48 89 55 a8 mov QWORD PTR [rbp-0x58],rdx
401e8e: 48 b8 36 65 33 34 35 movabs rax,0x3936323534336536
401e95: 32 36 39
401e98: 48 ba 33 33 35 33 35 movabs rdx,0x3133663533353333
401e9f: 66 33 31
401ea2: 48 89 45 b0 mov QWORD PTR [rbp-0x50],rax
401ea6: 48 89 55 b8 mov QWORD PTR [rbp-0x48],rdx
401eaa: 48 b8 36 31 33 35 36 movabs rax,0x3333313635333136
401eb1: 31 33 33
401eb4: 48 ba 36 36 33 33 33 movabs rdx,0x6437393333333636
401ebb: 39 37 64
401ebe: 48 89 45 c0 mov QWORD PTR [rbp-0x40],rax
401ec2: 48 89 55 c8 mov QWORD PTR [rbp-0x38],rdx
401ec6: 48 c7 45 d0 00 00 00 mov QWORD PTR [rbp-0x30],0x0
401ecd: 00
401ece: 48 c7 45 d8 00 00 00 mov QWORD PTR [rbp-0x28],0x0
401ed5: 00
401ed6: c7 45 e0 00 00 00 00 mov DWORD PTR [rbp-0x20],0x0
401edd: 48 8d 3d 24 31 09 00 lea rdi,[rip+0x93124] # 495008 <_IO_stdin_used+0x8>
401ee4: b8 00 00 00 00 mov eax,0x0
401ee9: e8 12 ed 00 00 call 410c00 <_IO_printf>
401eee: 48 8b 15 e3 e7 0b 00 mov rdx,QWORD PTR [rip+0xbe7e3] # 4c06d8 <stdin>
401ef5: 48 8b 85 68 ff ff ff mov rax,QWORD PTR [rbp-0x98]
401efc: 89 c1 mov ecx,eax
401efe: 48 8b 85 78 ff ff ff mov rax,QWORD PTR [rbp-0x88]
401f05: 89 ce mov esi,ecx
401f07: 48 89 c7 mov rdi,rax
401f0a: e8 21 66 01 00 call 418530 <_IO_fgets>
401f0f: 48 8b 85 78 ff ff ff mov rax,QWORD PTR [rbp-0x88]
401f16: 48 89 c6 mov rsi,rax
401f19: 48 8d 3d 11 31 09 00 lea rdi,[rip+0x93111] # 495031 <_IO_stdin_used+0x31>
401f20: b8 00 00 00 00 mov eax,0x0
401f25: e8 d6 ec 00 00 call 410c00 <_IO_printf>
401f2a: 48 8b 95 68 ff ff ff mov rdx,QWORD PTR [rbp-0x98]
401f31: 48 8d 4d 80 lea rcx,[rbp-0x80]
401f35: 48 8b 85 78 ff ff ff mov rax,QWORD PTR [rbp-0x88]
401f3c: 48 89 ce mov rsi,rcx
401f3f: 48 89 c7 mov rdi,rax
401f42: e8 89 f1 ff ff call 4010d0 <.plt+0xb0>
401f47: 85 c0 test eax,eax
401f49: 75 1a jne 401f65 <main+0x200>
401f4b: 48 8d 3d f6 30 09 00 lea rdi,[rip+0x930f6] # 495048 <_IO_stdin_used+0x48>
401f52: e8 a9 6c 01 00 call 418c00 <_IO_puts>
401f57: 48 8d 45 80 lea rax,[rbp-0x80]
401f5b: 48 89 c7 mov rdi,rax
401f5e: e8 9d 6c 01 00 call 418c00 <_IO_puts>
401f63: eb 0c jmp 401f71 <main+0x20c>
401f65: 48 8d 3d 50 31 09 00 lea rdi,[rip+0x93150] # 4950bc <_IO_stdin_used+0xbc>
401f6c: e8 8f 6c 01 00 call 418c00 <_IO_puts>
401f71: 48 89 dc mov rsp,rbx
401f74: b8 00 00 00 00 mov eax,0x0
401f79: 48 8b 5d e8 mov rbx,QWORD PTR [rbp-0x18]
401f7d: 64 48 33 1c 25 28 00 xor rbx,QWORD PTR fs:0x28
401f84: 00 00
401f86: 74 05 je 401f8d <main+0x228>
401f88: e8 93 2e 05 00 call 454e20 <__stack_chk_fail>
401f8d: 48 8b 5d f8 mov rbx,QWORD PTR [rbp-0x8]
401f91: c9 leave
401f92: c3 ret
Stack Layout I Observed:
In prologue rsp is subracted by 0x98 to allocate buffer in stack.
0x98 -> 152 / 8 -> 19 qwords
return address <+ 8>
rbp <+0>
rbx <-8>
qword-19 <-16>
qword-18 <-24> <canary>
qword-17 <-32> <0x0000000000000000>
qword-16 <-40> <0x0000000000000000>
qword-15 <-48> <0x0000000000000000>
qword-14 <-56> <0x6437393333333636>
qword-13 <-64> <0x3333313635333136>
qword-12 <-72> <0x3133663533353333>
qword-11 <-80> <0x3936323534336536>
qword-10 <-88> <0x3133323466353633>
qword-09 <-96> <0x6534313362363336>
qword-08 <-104> <0x3433303565363535>
qword-07 <-112> <0x6635383539333535>
qword-06 <-120> <0x6237363434353334>
qword-05 <-128> <0x6636333639363037>
qword-04 <-136> <rsp> pointer (pointing_to_where)?
qword-03 <-144> <0x63> 99
qword-02 <-152> <0x64> 100 for input length
qword-01 <-160>
Questions:
1.I noticed a mask is ANDed with the value in rdx:
401ddb: and rdx, 0xFFFFFFFFFFFFF000
However, after rdx is ANDed with this mask, there is a loop that compares rsp and rdx, and the loop continues until both become equal. The loop starts at 0x401ded and ends at 0x401e00. In that loop, rsp is subtracted by 4096 bytes until rdx and rsp become equal. But I'm not sure exactly what this is doing.
2.After the loop ends, I noticed something at 0x401e02 to 0x401e23. In this code block, rsp is subtracted by rdx and stored back into rsp:
sub rsp, rdx
Here, I'm totally unsure what this code block is doing, because of the confusion from the previous instructions related to the 4096-byte alignment.
3.At 0x401e33, I noticed rax is copied into rbx-0x88 after rsp is copied into rax. I think this is where a pointer to another location on the stack is loaded. But I’m not sure where exactly this pointer is pointing. Based on the stack layout I observed, can you help me understand the behavior of these instructions?
These are the questions I'm searching for answers to. One more thing: I found the flag (hardcoded value) early on, and that was easy. But I want to understand what's happening in the disassembly.
or QWORD PTR [rsp+0xff8],0x0. It is expanding the stack page by page to ensure that it triggers stack expansion correctly._chkstkfunction What is the purpose of the _chkstk() function? . (My answer on the first Q&A I linked in my last comment also mentions it and talks about Windows.)int n = 10;int arr[n];is a VLA. (Because it wasn'tconst int n = 10;, and constant-propagation across statements only happens with optimization enabled.) You'd have to try very hard to get GCC to emit thatdiv/imulsequence in unoptimized code from plain C other than a VLA; probably impossible since GCC will optimizen / 16to use AND even at-O0. And then using that result as a stack-allocation size with stack probes is pretty much absolute proof of variable-size stack allocation.