5

I need to design (and code) a "customized" string class in C++. I am seeking any documentation and pointers on design issues or potential pitfalls I should be aware of.

Links are very welcome, as are the identification of problems (if any) with current string libs (Qstring, std::string, and the others).

5
  • 7
    What are the requirements of this string and why can't you use syd:;string or QString etc.? These requirements have to start the investigation Commented Aug 30, 2010 at 11:02
  • Most C++ programs build upon the standard library. If you are not allowed to use it, that is a very major constraint which you should state (and justify) up front. Commented Aug 30, 2010 at 11:26
  • why can't you use any of the existing string implementations? Commented Aug 30, 2010 at 11:38
  • Mark's question is an important one. Why you want a custom solution will affect what pitfalls you are likely to encounter. For instance, if you need an immutable string class, you will encounter a very different set of constraints. Commented Aug 30, 2010 at 12:42
  • Although this question generated some interest ~12 years ago, I think it is off-topic per current standards. I will try to close it. Commented Aug 28, 2022 at 11:35

7 Answers 7

15

Despite the critics, I think this is a valid question.

The std::string is not a panacea. It looks like someone took the class from a pure-OO and dumped it in C++, which is probably the case.

Advice 1: Prefer non-member non-friend methods

Now that this is said, in this hour of internationalization, I would certainly advise you to design a class that would support Unicode. And I do say Unicode, not UTF-8 or UTF-16. It's ill-fitting (I think) to devise a class that would contain the data in a given encoding. You can provide methods to then output the information in various formats.

Advice 2: Support Unicode

Then, there is a number of points on the memory allocation schemes:

  • Small String Optimization: the class contains pre-allocated space for a few characters (a dozen or two), and thus avoid heap allocation for those
  • Copy On Write: the various strings share a buffer so that copy is cheap, when one string needs to modify its content, it copies the buffer if it's not the sole owner --> the issue is that multithreading introduces overhead here and it's been showed that for a general purpose technic this overhead could dwarf the actual copying cost
  • Immutability: "new" languages such as Java, C# or Python use immutable strings. Think of it as a pool of strings, all strings containing "Fooo" will point to the same buffer. Note that these languages support garbage collection, which rather helps here.

I would personally pick the "Small String Optimization" here (though it's not exclusive with the other two), simply because it's simple to implement and should actually benefit you (heap allocation cost, locality of reference issues).

The other two technics are somewhat complex in the face of multi-threading, and such are likely error-prone and unlikely to yield any real benefit unless carefully crafted.

And that brings my last advice:

Advice 3: Don't implement internal locking in an attempt of MultiThreading support

It will slow down the class when used in SingleThreaded context and will not yield as much benefit as you'd think when used in a MultiThreaded one.

Finally, you could perhaps find something suiting your tastes (or get some pointers) by browsing existing code. I don't promise to exhibit "smooth" interfaces though:

  • ICU UnicodeString: Unicode support, at least
  • std::string: over 100 member methods (counting the various overloads)
  • llvm StringRef: note how many algorithms are implemented as member methods :'(
Sign up to request clarification or add additional context in comments.

2 Comments

Touches on lots of issues, and does the question credit in a way the other's didn't bother to. But, anyone who's considering reimplementing string probably has some very specific performance requirements so it's questionable to make a blanket recommendation for Unicode support. Non-member non-friend functions have been all the rage with those up on design issues, while fully-fleshed out APIs are prefered by the end user. Many like Alexandrescu have found compromises where a kernel of functions are put into a template that fleshes out a std::string interface. Interoperability is valuable.
@Tony: the main idea behind non-member non-friend is to minimize the number of methods which actually need to know the internals, I still consider important however to deliver them within the same header file. For Unicode support, I agree I am partial, but having to "internationalize" a library that does not support Unicode from scratch is extremely tiresome :/ Do you have a link to Alexandrescu's compromise ? I am afraid I missed that.
3

Effective STL by Scott Meyers has some interesting discussion about possible std::string implementation techniques, though it covers rather advanced issues such as copy-on-write and reference counting.

Comments

2

Depending on what the "customization" is (e.g. a custom allocator), you may be able to do it via a template parameter of the std::basic_string class.

Comments

2

Herb Sutter gives a sample of a custom string class in the GotW #29. You could use it for the start.

Comments

1

From a general-purpose point of view a "new" string class ideally combined the good points of std::string, CString, QString and others. A few points in random order:

  • MFC CString supports using it in printf-like functions due to a very specific implementation. If you need or want this feature I recommend buying the book "MFC Internals" by George Sheperd. Although the book is from 1996(!) it's description of how CString is implemented should be worth it. http://www.amazon.com/MFC-Internals-Microsoft-Foundation-Architecture/dp/0201407213/ref=sr_1_1?ie=UTF8&s=books&qid=1283176951&sr=8-1
  • Check that your string class plays nicely with all interfaces you'll use it with (iostreams, Windows API, printf*, etc.)
  • Don't aim for full unicode support (as in: collation, grapheme clusters, ...) as that will mean your class will never be done, but consider making it a wchar_t class with conversion options.
  • Consider making the ctor/function that creates your string objects from char* always take the specific encoding of the character arrays. (Can be helpful in mixed UTF-8 / other character sets environments.)
  • Look at the full CString interface and at the full std:string interface and decide what you are going to need and what you can skip.
  • Look at QString to see what the other two miss.
  • Do not provide implicit conversion to neither char/wchar_t*
  • Consider adding convenient conversion functions to/from numeric types.
  • Don't write a string class without a full set of detailed Unit Tests!

Comments

0

The world doesn't need another string class. Is this homework? If not, use std::string.

6 Comments

Sorry, if I ask a question is because I need an answer, not advice.
I think the key word in the OP is "customized". I'd assume that this one would need to do something that a standard string class wouldn't.
This is not an advice. It is an actual and very pertinent answer that can save you hours, or rather weeks! As std::string gives access to the whole contained string, you can do whatever you want with an std::string and a free function working on it.
@Didier: No, this is advice, whether you like it or not. If you read the question carefully, you will see this is not an answer to it.
I didn't mean to come across as rude. But you will get better answers if you are clearer about the requirements. A restriction warranting your own string class is a pretty big one -- big enough to warrant specifying in your question; it will likely place its own implications on the design of your class which will not be covered in any generalized article / link.
|
0

The problem with std::string is.. that you can't change it. Sometimes you need the basics of a std::string, but disagree with the implementation of your c++ library.

As an example, thread-safe reference counting employed means lots of locking (or at least locked operations). Also, if most of your strings are short (because you know this will be the case), you might want a string class that is optimized for that use-case.

So even if you like the std::string API, or at least have learned to live with it, there is room for 'competing implementations' that are more or less workalikes.

PowerDNS would love to have one, as we currently pass many dns host names around, and a large majority of them would fit in a, say, 25 bytes fixed buffer, which would relieve a lot of new/delete pressure.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.