5

When I execute the next code

int main()
{
    char tmp[] = "hello";
    printf("%lp, %lp\n", tmp, &tmp);
    return 0;
}

I had got the same addresses. But for the next code, they will be different

int main()
{
    char *tmp = "hello";
    printf("%lp, %lp\n", tmp, &tmp);
    return 0;
}

Could you explain the memory differences between those examples?

5
  • 1
    char tmp[] = "hello" is an array of 6 characters initialized to "hello\0" (it has automatic storage duration and resides within the program stack). char *tmp = "hello"; is a pointer initialized with the address for the String Literal "hello\0" that resides in readonly memory (generally within the .rodata section of the executable). (readonly on all but a few non-standard implementations) An array is converted to a pointer to its first element on access. Commented Jun 7, 2021 at 3:54
  • @David C. Rankin Re "readonly on all but a few non-standard implementations", I find it doubtful that C requires a machine to have virtual memory to have a standard implementation. Once should always consider the memory to be read-only, but I challenge the claim that the memory has to be read-only for the implementation to be standard. Commented Jun 7, 2021 at 4:00
  • 1
    @ikegami I concede that point. The standard doesn't require a conforming implementation to create string literals in read only memory. The point I was making is most do. Commented Jun 7, 2021 at 4:06
  • At very least the C standard states modifying string literals is undefined behaviour. Commented Jun 7, 2021 at 4:51
  • While legal in C you shouldn't assign string literals to non-const char pointers, always do char const* ptr = "some literal"; – otherwise you almost certainly will run into modifying the literal at some point in the future, which is UB, as stated above. Being able to assign immutable literals to char* pointers is a legacy from the very first days of C where const did not yet exist. Commented Jun 8, 2021 at 13:46

3 Answers 3

5

char tmp[] = "hello"; is an array of 6 characters initialized to "hello\0" (it has automatic storage duration and resides within the program stack).

char *tmp = "hello"; is a pointer to char initialized with the address for the string literal "hello\0" that resides in readonly memory (generally within the .rodata section of the executable, readonly on all but a few implementations).

When you have char tmp[] = "hello";, as stated above, on access the array is converted to a pointer to the first element of tmp. It has type char *. When you take the address of tmp (e.g. &tmp) it will resolve to the same address, but has a completely different type. It will be a pointer-to-array-of char[6]. The formal type is char (*)[6]. And since type controls pointer arithmetic, iterating with the different types will produce different offsets when you advance the pointer. Advancing tmp will advance to the next char. Advancing with the address of tmp will advance to the beginning of the next 6-character array.

When you have char *tmp = "hello"; you have a pointer to char. When you take the address, the result is pointer-to-pointer-to char. The formal type is char ** reflecting the two levels of indirection. Advancing tmp advances to the next char. Advancing with the address of tmp advances to the next pointer.

Sign up to request clarification or add additional context in comments.

4 Comments

char tmp[]: Advance tmp is unlucky wording, as arrays cannot be incremented (like ++tmp). I wouldn't describe 'string literal "hello\0"' as that would imply a literal with two trailing null characters.
Yes, I was referring to the pointer that results from access. Obviously you cannot iterate with the array itself. The intent being char *p = tmp; or char (*p)[6] = &tmp; in the array case. Thanks for pointing that out.
So to clear up my mind, (supposing tmp1 and tmp2 in the order on your answer), from an Assembly point of view tmp1 == &tmp1, and tmp2 != &tmp2. tmp1 and &tmp1 are exactly the same except in C they have different types (both the same stack address); tmp2 is the pointer to the 1st string char (which might be on the stack or on read-only data or something), and &tmp2 is a pointer to tmp2, which in this case will be a stack address (because tmp2 is a local variable - or at least supposing it is). And these things are the same passed to a function as arguments. Is this correct?
@DADi590 - you are dead-on. That is the exact case. tmp1 and &tmp1 resolve to the same address tmp2 is a pointer to the 1st char in the string, and &tmp2 is the address of that pointer, not the address of the 1st character in the string. In other words, the address for the 1st character in the string is the address pointed to (e.g. held-by) tmp2, &tmp2 is where that address is stored in memory.
3
char a[] = "hello";

and

char *a = "hello";

Get stored in different places.

char a[] = "hello"

In this case, a becomes an array(stored in the stack) of 6 characters initialized to "hello\0". It is the same as:

char a[6];
a[0] = 'h';
a[1] = 'e';
a[2] = 'l';
a[3] = 'l';
a[4] = 'o';
a[5] = '\0';

char *a = "hello"

Inspect the assembly(this is not all the assembly, only the important part):

    .file   "so.c"
    .text
    .section    .rodata
.LC0:
    .string "hello" ////Look at this part
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    $.LC0, -8(%rbp)
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

See

.section    .rodata
.LC0:
    .string "hello"

This is where the string is stored. char a[] is stored in the stack while char *a is stored wherever the compiler likes. Generally in rodata.

Comments

3

With

char tmp[] = "hello";

you are setting aside an array of char large enough to store the string "hello" and copying the contents of the string to that array, such that you get this in memory:

     +–––+
tmp: |'h'| tmp[0]
     +–––+
     |'e'| tmp[1]
     +–––+
     |'l'| tmp[2]
     +–––+
     |'l'| tmp[3]
     +–––+
     |'o'| tmp[4]
     +–––+
     | 0 | tmp[5]
     +–––+

There is no tmp object separate from the array elements themselves, so the address of the array (tmp) is the same as the address of its first element (tmp[0]).

With

char *tmp = "hello";

you are creating a pointer to char and initializing it with the address of the first character in the string literal "hello", such that you get this in memory:

     +–––+       +–––+
tmp: |   | ––––> |'h'| tmp[0]
     +–––+       +–––+
                 |'e'| tmp[1]
                 +–––+
                 |'l'| tmp[2]
                 +–––+
                 |'l'| tmp[3]
                 +–––+
                 |'o'| tmp[4]
                 +–––+
                 | 0 | tmp[5]
                 +–––+

In this case tmp is a separate object from the array elements, so the address of tmp is different from the address of tmp[0].

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.