I'm writing a little toy program to try to help myself better understand this language (AT&T syntax, x86_64 assembly language). Consider this code, if you'll be so kind:
.section .data
mystring: .ascii "This is my string.\0"
mystringptr: .quad mystring
When it comes time to try to access a given character from mystringptr, it takes an extra step. Whereas, if I were trying to access mystring directly, it'd be easy. Supposing I want, for some odd reason, the fifth character of the string:
.section .text
movq $mystring, %rbx
movq 5(%rbx), %rdi
I would dereference %rbx with a 5-byte offset using displacement/base pointer mode (if I'm using the right terminology -- there's a good chance I'm not), and the %rdi register will then return the ASCII table value for the character 'i' to me. I'm fine so far. But trying to get the same result from mystringptr is more difficult. I'm finding I have to do this:
.section .text
movq $mystringptr, %rbx
movq (%rbx), %rax
movq 5(%rax), %rdi
Which gets me the same result. I suppose it makes sense that given the extra abstraction layer with the pointer, that there is likewise another dereference involved in accessing the string through that pointer, but I'm just not intuitively grasping it. Can anyone walk me through what's happening here?
With movq $mystring, %rbx (or, alternatively, leaq mystring, %rbx), I'm moving a pointer to mystringptr into %rbx, right? And then with the next statement I'm dereferencing %rbx (which would normally retrieve the contents at that address, right?) and moving those contents into %rax. This is where I'm getting lost. I guess I just want to understand what's really happening, beneath the hood so to speak, with this code. It's great that it works, but I want to understand it.
Editor's footnotes about code details, leaving just the conceptual part of the question in need of answering:
.asciz "This is my string." is the usual way to write a 0-terminated C-style string in Unix assemblers like GAS, but \0 does work since this style of assembler does process C-style escapes inside quotes.)
lea mystring(%rip), %rbx (general case) or mov $mystring, %ebx (only non-PIE or largeaddressaware:no) are the standard ways to put a pointer to a label into a register in x86-64.
Also related: How to load a single byte from address in assembly ; movq mem, reg loads 8 bytes, movzbl mem, reg loads 1 and zero-extends.
mystringis similar tochar *mystring = "This is my string.";, andmystringptrischar **mystringptr = &mystring;. In assemblymystringis a pointer to the first character in the string, andmystringptris a pointer tomystring.mystringisstatic char mystring[] = "This is my string.";not a pointer to anonymous rodata. Or would be if it was.ascii "..."or.asciz "..."instead of just a bare string literal after the label, which is a syntax error since it's not a valid instruction mnemonic..ascii, and edited in a footnote about other improvements to the details of the code. That's not what the question was asking about at all, which is why I went for this unusual approach instead of writing an answer. I would normally just have commented, but since I was editing anyway... I ended up putting my additions at the end, rather than mixed in with the original paragraphs, since not all of them fit well mixed in. Anyway, I don't love the result, not something I'm going to do every time.