18

Google's Protocol buffer uses the C++ standard string class std::string as variable size byte array (see here) similar to Python where the string class is also used as byte array (at least until Python 3.0).

This approach seems to be good:

  • It allows fast assignment via assign and fast direct access via data that is not allowed with vector<byte>
  • It allows easier memory management and const references, unlike using byte*.

But I am curious: Is that the preferred way for a byte arrays in C++? What are the drawbacks of this approach (more than a few static_casts)

1
  • 1
    You forgot: * If your variable byte array really is a string, then the string class gives you useful string methods like find(), which std::vector does not. (For std::vector, you would need to pull in <algorithms> or similar.) Commented Jan 10, 2010 at 14:30

3 Answers 3

8

Personally, I prefer std::vector, since std::string is not guaranteed to be stored contiguously and std::string::data() need not be O(1). std::vector will have data member function in C++0x.

Sign up to request clarification or add additional context in comments.

5 Comments

And even without std::vector::data(), you can get the same result with &v[0] (after making sure it's not empty).
You can force std::string to store contiguously by calling its c_str() method. Crude, but effective. This way you force contiguous memory only when you need it.
Mike, data() is more suited here than c_str() as it doesn't bother appending the terminating 0. The latter can be even slower than the former.
@Mike: No, c_str() forces the string to give you a contiguous representation, but that doesn't force it to "store contiguously". The clue is that data() and c_str() don't invalidate iterators or references, so the implementation cannot "throw away" any previous non-contiguous storage it was using. It's undefined behaviour to modify the array you get back from c_str() or data(), whereas with vector you can modify the array you get from &v[0] and it modifies the vector. This is a load of old rubbish, and nobody implements string non-contiguously, which is why C++0x is changing it ;-)
No, scratch that thing about the "clue". There's a difference here between SGI's original STL and the standard, and I was looking at the STL. data() and c_str() can invalidate references. But it still doesn't mean they're required to convert the "real" storage of the string. They both return const char*, so unlike vectors, strings are not required by the standard to have any way of providing contiguous, modifiable data. This is what you'd need for instance to use a C-style reading API.
3

std::strings may have a reference counted implementation which may or may not be a advantage/disadvantage to what you're writing -- always be careful about that. std::string may not be thread safe. The potential advantage of std::string is easy concatenation, however, this can also be easily achieved using STL.

Also, all those problems in relation to protocols dissapear when using boost::asio and it's buffer objects.

As for drawbacks of std::vector:

  1. fast assign can be done by a trick with std::swap
  2. data can be accessed via &arr[0] -- vectors are guaranteed (?) to be continious (at least all implementations implement them so)

Personally I use std::vector for variable sized arrays, and boost::array for static sized ones.

10 Comments

Note that the behavior of &arr[0] is defined only if the vector contains at least one element.
std::vector is (effectively) guaranteed not to be ref counted - however, that doesn't make it any more thread-safe than std::string.
@Niel, we've went through hell because of GCC's thread unsafety of strings while working on a server. Switching to vectors did help.
Access from multiple threads is what thread safety is all about - it's easy to be thread safe if you only have one thread!
@Neil, it may be possible that the string implementation modifies a shared state without proper locking. Ok, the question should be: Is std::string threadsafe as long as I don't access/modify the same instance from multiple threads, but I access other instances of std::string concurrently.?
|
2

I "think" that using std::vector is a better approach, because it was intended to be used as an array. It is true that all implementations(I know and heard of) store string "elements" in contiguous memory, but that doesn't make it standard. i.e. the code that uses std::string like a byte array, it assumes that the elements are contiguous where they don't have to be according to the standards.

1 Comment

In the original C++ standard, std::vector didn't require contiguous storage, but the 2003 standard does require it (§23.2.4/1). C++ 0x will add a similar requirement for std::string (§[string.require]/5 in N3000).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.