0

Does the definition:

char arr_of_chars[] = "hello world";

create a constant character array (null terminated) somewhere in memory, and then copy the content of that array to arr_of_chars, or does it directly assign it to arr_of_chars?

What exactly is the mechanism that works here?

5
  • hello world is null-terminated string placed somewhere in the system. When variable is initialized, string is copied to RAM. Commented Oct 4, 2017 at 6:41
  • 1
    If the variable is global or static, then so is the initialization, and there is no copying being done at runtime. If the variable is automatic, then the initialization is dynamic and is performed each time the variable is instantiated. Commented Oct 4, 2017 at 6:45
  • @TomKarzes I don't believe that. Consider this: const char a[] = "hello world"; char b[] = "hello world";. You have 2 exact values, and even if they are global or static, you will have only one memory storage of hello world in .rodata section, specially on embedded systems. It will be copied to non-const variable on runtime first. Commented Oct 4, 2017 at 6:47
  • 2
    @tilz0R this is entirely up to the compiler to decide. What's "better" depends on the target system and exact requirements. Commented Oct 4, 2017 at 7:01
  • @tilz0R In that case, b will normally be placed in the data section with the corresponding initialization value. The initialization will occur as a consequence of the program being loaded into memory. Commented Oct 4, 2017 at 9:08

4 Answers 4

1

What you're asking is not specified by C. In a nutshell, C is specified in terms of an abstract machine and its observable behavior. In this case, this means all you know is there is an array variable arr_of_chars initialized from a string literal.

When talking about segments, copying, etc, you're already talking about concrete implementations of C and what they're doing. Assuming your arr_of_chars is at file scope and given a target machine/system that knows binaries with data segments, it would be possible for a C compiler to put the initialized array directly in a data segment -- the observable behavior would be no different from an approach where the runtime first copies the bytes to your array.

Sign up to request clarification or add additional context in comments.

1 Comment

Exactly. Another possibility for an initialization, if the string is short, would be to use a hardware register or an immediate, and depending on the usage of the char array, not to represent it in memory at all.
0

"...creates a constant character array (null terminated) somewhere in memory, and then copies the content of that array to arr_of_chars"

Indeed. The string literal "hello world" is stored somewhere in the .rodata section of the program, unless the compiler managed to optimize it away entirely (depends on your array's scope). From there it is copied into your array.

4 Comments

It is the only exception from the general rule as = as it copies the string itself, not the pointer as usually :).
There's no exception here at all. This is an initialization, not an assignment.
What is .rodata ?-)
@JensGustedt Although the C standard does not know anything of computer memories and segments, the real world has industry standard naming conventions for segments. .rodata would be the read-only data segment, used by the popular ELF link file format, among others. A summary & explanation of the industry standard memory segments can be found here.
0

This will create a null terminated string hello world\0 in the const segment.

In the main function this string will be copied to the character array.

Let me highlight a few lines from the assembly output to clairfy this.

PUBLIC  ??_C@_0M@LACCCNMM@hello?5world?$AA@

This creates a public token.

CONST   SEGMENT
??_C@_0M@LACCCNMM@hello?5world?$AA@ DB 'hello world', 00H
CONST   ENDS

This assigns the constant null terminated string to the token.

lea rax, QWORD PTR arr_of_chars$[rbp]
lea rcx, OFFSET FLAT:??_C@_0M@LACCCNMM@hello?5world?$AA@
mov rdi, rax      ; Set destination to stack location
mov rsi, rcx      ; Set source to public token
mov ecx, 12       ; Set counter to number of times to repeat
rep movsb         ; Copy single byte from source to destination and increment locations

This sets up the source and destination and copies character by character 12 times which is the length of "hello world" and the null terminator. The destination is a location on the stack and the source is the public token.

4 Comments

How does examining the behavior of one concrete implementation answer this question?
@FelixPalmen - You're right. What I have posted is only an observation. But I'm not sure if the standard defines a method to execute this. However, the operation is an array being initialized with a string literal. So the string literal has to be stored somewhere and must be copied to the array during initialization.
It isn't specified by the standard, that's my point ;) And no, it's not absolutely necessary to store the string literal somewhere. It all depends on the target's binary format and the decisions of the compiler, and if this exact string literal is only used for initializing this variable, it could be a valid decision to "eliminate" it and put the initialized array in a data segment -- for example :)
I agree. So I guess I'll make a few more observations on some of the other well known C compilers as well. :)
0

It is subject of storage of the string in c.

Strings can be stored in following ways,

  1. Strings as character arrays
  2. Strings using character pointers

When strings are declared as character arrays, they are stored like other types of arrays in C. For example, if str[] is an auto variable then string is stored in stack segment, if it’s a global or static variable then stored in data segment.

Ex.

char str[] = "Hello_world";

In case of storing the strings using the character pointers, It can be done by two ways,

  1. Read only string in a shared segment.

    Ex.

    char *str  =  "Hello_World";
    

In the above line "Hello_World" is stored in a shared read only location, but pointer str is stored in a read-write memory. You can change str to point something else but cannot change value at present str. So this kind of string should only be used when we don’t want to modify string at a later stage in program.

  1. Dynamically allocated in heap segment.

    char *str = NULL;
    int size = 6;
    
    str = (char *) malloc(sizeof(char)*size);
    
    *(str+0) = 'H'; 
    *(str+1) = 'E';  
    *(str+2) = 'L';  
    *(str+3) = 'L';  
    *(str+4) = 'O';  
    *(str+5) = '\0';
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.