Few assembly syntax questions and interpretation of disassembler code

Question

I'm trying to reverse engineer an executable file in part of some exercise/challenge.

I opened the file in IDA64 disassembler. Most of the code is hidden but three subroutines are visible (including the Start). My main experience with assembly was with MIPS at school - so I have few things that I'm puzzling about:

What does this line mean? is the value at the address of ebx is 0 or just the first byte of it?
```
mov     byte ptr [ebx], 0
```
When I see this two lines of code, what is the address '1000h'? Meaning - where can I find it and what's in it? The all code in the IDA start at 401000.
```
mov     ebx, 1000h
xor     [ebx], eax
```
Are there any conventions regarding eax, ebx and esi registers and what should I assume there is in them at the beginning of the code? The executable is asking for an input from the user - can I assume that this input is in any specific register?

Don't even try to "guess" anything about assembly, there's no point (its syntax is not that "logical", it's not progr. language designed for progr. in mind, but mnemonics designed to describe CPU HW capabilities, which follow rather the "HW design" logic). You can start by doing some tutorial or reading book to get basics. About things like conventions: x86 has several modes of operation, and even in the currently most common mode (64b protected) each OS has slightly different conventions, so without specifying what target platform in which mode you are addressing, it's impossible to answer. — Ped7g
– Ped7g, Commented Aug 19, 2018 at 6:38
stackoverflow.com/tags/x86/info - check here for various resources, one of the first links is brief intro into MASM syntax and 32 bit protected mode assembly, which will probably answer most of your syntax questions. Things like what is at address "1000h" depends on OS and executable structure, in this particular case it actually doesn't resemble anything valid to me, in any modern OS user process rarely needs low address space around (~1MB) nullptr, as that area is usually marked as no-access in the virtual memory map. — Ped7g
– Ped7g, Commented Aug 19, 2018 at 6:42
1) Move a zero to wherever EBX points. 2) 0x1000 is in memory. That's whereo uy can find out what's in it. 3) Read the Intel docs — David Hoelzer
– David Hoelzer, Commented Aug 19, 2018 at 9:05
mips is not quite but pretty close to a clean and orthogonal instruction set, pretty good for educational purposes. x86 is about as ugly as it gets. as far as calling conventions, those are documented for both, the x86 has varied over time quite a bit. The individual registers used to have special purposes and were not general purpose, that has changed over time though some of the compatible instructions reflect the old/original meanings. Part of the ugliness. Knowing another instruction set helps greatly, one with flags even better. At the end of the day though... — old_timer
– old_timer, Commented Aug 19, 2018 at 23:02

Margaret Bloom · Accepted Answer · 2018-08-19 16:48:22Z

3

The x86 architecture is a typical CISC architecture, it can perform stores of different sizes.
A mov [ebx], 0 is ambiguous (which size is used?) but mov byte [ebx], 0 fixes the size to 8-bit.
The ptr is just an embellishment so that the instruction reads as almost self-documenting: move to the byte pointed by ebx zero.
That also explains the semantics of the instruction, there are plenty of tutorial on the internet about the x86 addressing modes.
I picked up the first one.

1000h is a strange address, it's probably outside the working set of the process and it also is the typical RVA (Relative Virtual Address) of the .text section.
This makes me think that there is a relocation entry pointing at that instruction's operand.
IDA free can't debug but x64dbg can, try debugging the program to see if the address turns into something like BASE_ADDRESS + 1000h.
IDA will show you a static view of the PE sections once loaded, so you can inspect the initial value of the global variables but to see a live view of the memory you have to debug the program.

Officially, at the PE entry-point the registers have undefined values, but since execution starts in a user-mode library, some values leak, though this is not a reliable ABI.
There are a few calling conventions used by the compiler and by the APIs, you should get accustomized to that.
Each compiler will also have its typical register allocation algorithm but this may be too complex to exhibit a pattern but in very simple routines.
Input values will probably be in some register at some point but finding when and where is the hardest part.
By studying the application behavior you can write down a set of possible input APIs the program will use and break on every of these.
Upon return to the program code, you'll have the input string (IO is string based). Alternatively, you reverse engineer the application from the start, a trained analyst can find the WinMain pretty easily and if the program is not obfuscated or written in a very abstract language, it'll be quick to find where the input is read.
A third way is writing a trimmed down twin application with a technology very close to the original one and then analyze the latter.
This way you also have a source code to get through the fog of the disassembly.

answered Aug 19, 2018 at 16:48

Margaret Bloom

44.7k5 gold badges91 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Peter Cordes Over a year ago

The OP says they have MIPS experience. On MIPS, different store sizes have different mnemonics (sw / sh / sb), but on x86 the byte-store opcode just uses the same mov mnemonic as the other store opcodes. (The prefix-based encoding for word vs. dword vs. qword is very CISC, though.)

old_timer Over a year ago

If you define CISC as whatever the x86 did sure. It simply happens that intel chose to have a syntax that overloaded the mov mnemonic with adjectives to complete the definition to find the right opcode, where mips chose different mnemonics to complete the opcode. I wouldnt generalize it in this way.

Collectives™ on Stack Overflow

Few assembly syntax questions and interpretation of disassembler code

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related