1

I've understood that string arrays end with a '\0' symbol. So, the following code should print 0, 1, 2 and 3. (Notice I'm using a range-based for() loop).

$ cat app.cpp
    #include <iostream>
    int main(){
        char s[]="0123\0abc";
        for(char c: s) std::cerr<<"-->"<<c<<std::endl;
        return 0;
    }

But it does print the whole array, including '\0's.

$ ./app
-->0
-->1
-->2
-->3
-->
-->a
-->b
-->c
-->

$ _

What is happening here? Why is the string not considered to end with '\0'? Do C++ collections consider (I imagine C++11) strings differently than in classical C++?

Moreover, the number of characters in "0123\0abc" is 8. Notice the printout makes 9 lines!

(I know that std::cout<< runs fine, as well as strlen(), as well as for(int i=s; s[i]; i++), etc., I know about the end terminator, that's not the question!).

4
  • 2
    std::cout << "123\0abc" would print 123. Commented Jun 12, 2020 at 6:37
  • @Jarod42, obviously, that's not the question. Commented Jun 12, 2020 at 6:38
  • 1
    C-arrays can decays to pointer, but are different. Commented Jun 12, 2020 at 6:38
  • 1
    "obviously". For me, both output are obvious too ;-) So trying to find your misunderstanding cause. Commented Jun 12, 2020 at 6:43

4 Answers 4

3

s is of type char [9], i.e. an array containing 9 chars (including the null terminator char '\0'). Ranged-based for loop just iterators over all the 9 elements, the null terminator char '\0' is not considered specially.

Executes a for loop over a range.

Used as a more readable equivalent to the traditional for loop operating over a range of values, such as all elements in a container.

for(char c: s) std::cerr<<"-->"<<c<<std::endl; produces code prototype equivalent to

{
  auto && __range = s ;
  auto __begin = __range ;         // get the pointer to the beginning of the array 
  auto __end = __range + __bound ; // get the pointer to the end of the array ( __bound is the number of elements in the array, i.e. 9 )
  for ( ; __begin != __end; ++__begin) {
    char c = *__begin;
    std::cerr<<"-->"<<c<<std::endl;
  }
}
Sign up to request clarification or add additional context in comments.

Comments

2

When you declare a char[] as char s[] = "0123\0abc" (a string literal), s becomes a char[9]. The \0 is included because it needs space too.

The range-based for-loop you use does not consider the char[9] as anything else than an array containing char with the extent 9 and will happily provide every element in the array to the inner workings of your loop. The \0 is just one of the chars in this context.

Comments

2

Be aware that char not necessarily needs to define a character only – it can be used to store any arbitrary 8-bit value (on some machines, char is wider, though, encountered one with a 16-bit char already – then there's no int8_t available...), although signed char or unsigned char – according to specific needs – should be preferred, as signedness of char is implementation defined (or even better: int8_t or uint8_t from cstdint header, provided they are available).

So your string literal actually is just an array of nine integral values (just as if you had created an int-array, only the type usually is narrower). A range based for loop will iterate over all of these nine 8-bit integers, and you get the output in your example.

These integral values only get a special meaning in specific contexts (functions), such as printf, puts or even operator>>, where they are then interpreted as characters. When used as C-strings, a 0 value inside such an array marks the end of the string – but this 0-character still is part of that string. For illustration: puts might look like this:

int puts(char const* str)
{
    while(!*str) // stops on encountering null character 
    {
        char c = *str;

        // + get pixel representation of c for console, e. g 'B' for 66
        // + print this pixel representation to current console position
        // + advance by one position on console

        ++str;
    }
    return 0; // non-negative for success, could return number of
              // characters output as well...
}

4 Comments

@TedLyngmo Good hint... Thanks. Actually, that 16-bit-char DSP already mentioned had sizeof(char) == sizeof(int)...
"Porting" heaven! :-) I've been in the same room as a machine that has "32 bits for everything" - "char ... long ... fark it. I'll go with 32".
@TedLyngmo Then that machine is not standard conformant - long long requires 64 bits! (Just joking...)
I'm sure that machine didn't ask anyone for permission - It was the machine after all :-)
0
  • Here s is an array of char, so it includes \0 too. When you use for(char c: s), the loop will search all char in the array. But in C, the definition tells us:

    A string is a contiguous sequence of characters terminated by and including the first null character.

    And

    [...] The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters...

    So, when you use C standard functions to print the array s as a string, you will see the result that you wanted. Example: printf("%s", s);

  • "the number of characters in "0123\0abc" is 8. Notice the printout makes 9 lines!"

    Again, printf("%s; Len = %d", s, strlen(s)); runs fine!

1 Comment

strlen returns size_t, so you must print it with %zu, not %d

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.