5

I can't understand how memory is allocated in the following code:

#include<stdio.h>
#include<string.h>

int main()
{
    char a[]={"text"};
    char b[]={'t','e','x','t'};
    printf(":%s: sizeof(a)=%d, strlen(a)=%d\n",a, sizeof(a), strlen(a));
    printf(":%s: sizeof(b)=%d, strlen(b)=%d\n",b, sizeof(b), strlen(b));
    return 0;
}

The output is

:text: sizeof(a)=5, strlen(a)=4
:texttext: sizeof(b)=4, strlen(b)=8

By looking into memory addresses and the output code it seems that variable b is placed before variable a, and that's why strlen(b), by looking for \0, returns 8. Why does this happen? I expected variable a to be declared first.

4
  • 5
    You are not guaranteed that memory is allocated in any particular order. a could come before or after b, or they could be nowhere near each other. Commented Jun 25, 2011 at 21:40
  • 2
    It is. That's why it is higher in the stack. But you should really not rely on expectations like that. Commented Jun 25, 2011 at 21:41
  • Using strlen() with array of char's can be dangerous. Commented Jun 25, 2011 at 23:48
  • @JustAnotherProgrammer: No more dangerous than calling strlen on allocated memory. Commented Jun 26, 2011 at 1:15

4 Answers 4

7

The language makes no guarantees about what is placed where. So, your experiment make very little sense. It might work, it might not. The behavior is undefined. Your b is not a string and it is UB to use strlen with something that is not a string.

From the purely practical point of view though, local variables are usually allocated on the stack, and the stack on may moderns platforms (like x86) grows backwards, i.e. from higher addresses to lower addresses. So, if you are using one of these platforms, it is possible that your compiler decided to allocate variables in the order of their declaration (a first and b second), but because stack grows backwards b ended up at lower addresses in the memory than a. I.e. b ended up before a in memory.

One can note though that a typical implementation does not normally allocate stack space for local variables one-by-one. Instead, the entire block of memory for all local variables (stack frame) is allocated at once, meaning that the logic I described above does not necessarily apply. Yet, it is still possible that the compiler still follows the "reverse" approach to local variable layout anyway, i.e. variables declared earlier are placed later in the local memory frame, "as if" they were allocated one-by-one in the order of their declaration.

Sign up to request clarification or add additional context in comments.

Comments

2

Your "b" character array is not null-terminated. To understand consider that the char a[] declaration is equivalent to:

char a[] = { 't', 'e', 'x', 't', '\0' };

In otherwords strlen(b) is undefined, it just looks through random memory for a NULL character (0 byte).

2 Comments

It seems to me that the question makes it clear that the OP knows that. The question is really "why is a above b?"
@Pascal: It's his compiler or something else, see my answer.
1

I do not get the same output see here on my ideone snippet: http://ideone.com/zHhHc

:text: sizeof(a)=5, strlen(a)=4
:text

When I use codepad, I see different output than you: http://codepad.org/MXJWY136

:text: sizeof(a)=5, strlen(a)=4
:text: sizeof(b)=4, strlen(b)=4

Also, when I compile it a C++ compiler, I get the same output: http://ideone.com/aLNjv

:text: sizeof(a)=5, strlen(a)=4
:text: sizeof(b)=4, strlen(b)=4

So something is definitely wrong on your platform and/or compiler. It could be undefined behavior (UB) due to the fact that your char array does not have a null-terminator (\0). At any rate...

While both a and b may look the same, they are not due to how you have defined the character arrays.

char a[] = "text";

What this array looks like in memory is the following:

----------------------
| t | e | x | t | \0 |
----------------------

The double quotes mean "text string" and will add the \0 automatically (that's why the size is 5). In b, you have to add it manually but the size is 4. The strlen() in b is searching until end in your implementation, which could include garbage characters. This is a big problem in many security aspects of coding for char arrays that are not null terminated.

13 Comments

You've said "correct output" here multiple times, but surely strlen(b) is UB?
@0A0D: Undefined behavior. You vaguely mention it later on (at the bottom). I guess my point is that technically any answer could be "correct" for strlen(b).
@Frexus: You have not shown how you are inferring b is declared before a, other than you have a doubling up of the word text. This is strange for sure, since three different compilers I have shown do not display that behavior. Since the "string" is not null-terminated, anything can happen.
@0A0D, why do you say something is wrong with the platform or the compiler? The OP is causing undefined behaviour - that's certainly not his tools' fault.
Wrong, wrong, wrong. There is nothing wrong with their compiler - you have to work from the standard(s), not the compiler when determining what is correct behaviour or otherwise. All the deductive reasoning in the world will not help you if you begin with a false premise.
|
1

I compiled your code on Linux/x86 with GCC using the -S flag to see assembly output. That shows that for me, b[] is allocated at a higher memory address than a[], so I didn't get strlen(b)=4.

    .file   "str.c"
    .section    .rodata
    .align 4
.LC0:
    .string ":%s: sizeof(a)=%d, strlen(a)=%d\n"
    .align 4
.LC1:
    .string ":%s: sizeof(b)=%d, strlen(b)=%d\n"
    .text
.globl main
    .type   main, @function
main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $32, %esp
    movl    %gs:20, %eax
    movl    %eax, 28(%esp)
    xorl    %eax, %eax
    movl    $1954047348, 19(%esp)
    movb    $0, 23(%esp)
    movb    $116, 24(%esp)
    movb    $101, 25(%esp)
    movb    $120, 26(%esp)
    movb    $116, 27(%esp)
    leal    19(%esp), %eax
    movl    %eax, (%esp)
    call    strlen
    movl    %eax, %edx
    movl    $.LC0, %eax
    movl    %edx, 12(%esp)
    movl    $5, 8(%esp)
    leal    19(%esp), %edx
    movl    %edx, 4(%esp)
    movl    %eax, (%esp)
    call    printf
    leal    24(%esp), %eax
    movl    %eax, (%esp)
    call    strlen
    movl    $.LC1, %edx
    movl    %eax, 12(%esp)
    movl    $4, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    %edx, (%esp)
    call    printf
    movl    $0, %eax
    movl    28(%esp), %edx
    xorl    %gs:20, %edx
    je  .L2
    call    __stack_chk_fail
.L2:
    leave
    ret
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

In the code above, $1954047348 followed by $0 is a[] with the null termination. The 4 bytes after that are b[]. This means b[] was pushed on the stack before a[] since the stack grows down on this compiler.

If you compile with -S (or equivalent), you should see b[] at a lower address than a[], so you'll get strlen(b)=8.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.