53

The C++11 standard guarantees that std::sort has O(n logn) complexity in the worst case. This is different from the average-case guarantee in C++98/03, where std::sort could be implemented with Quicksort (maybe combined with insertion sort for small n), which has O(n^2) in the worst case (for some specific input, such as sorted input).

Were there any changes in std::sort implementations in different STL libraries? How is C++11's std::sort implemented in different STLs?

7
  • 17
    Not a proper answer because I haven't conducted a full survey, but basically everyone was already using Introsort anyway, and that's why the stronger requirement is in practice not a problem for implementers. And even before Introsort, nobody with any sense used Quicksort with the leftmost element as the pivot. That was a simplification for the purposes of demonstrating the algorithm, although of course it does actually work fine for random data too. So in practice the worst case of Quicksort is not sorted input. Commented Mar 12, 2014 at 0:02
  • 2
    Yeah hybrid quicksorts (like introsort) are what pretty much everyone uses. I guess you could also find merge sort hybrids like timsort. Commented Mar 12, 2014 at 12:11
  • If you are trying to evaluate sorting methods, check on the number of constructors created during the sort. If the default constructor is inefficient, it will be slow, no matter how fast the sort is. Commented Mar 15, 2014 at 7:37
  • 2
    @bamboon it is not a duplicate because attracts attention to the strengthening requirements to the std::sort in the C++11 standard. And really the question is about: do C++11 std::sort implementations match C++11 standard O(n logn) worst case requirement? ( In mentioned previous answer it is said about O(n^2) worst case complexity of std::sort which is now should not be possible. ) Commented Mar 16, 2014 at 0:07
  • 1
    @cup You meant the copy constructor (assuming there is no move constructor and no fast specialization of std::swap), I assume? I don’t see why the default constructor should be invoked at all. Commented Mar 17, 2014 at 7:00

2 Answers 2

26

The question is, how can STL say std::sort worst case is O(N log(N)), even though it is in essence a QuickSort. STL's sort is IntroSort. IntroSort is in essence a QuickSort, the difference introduced change the worst case complexity.


QuickSort worst case is O(N^2)

What ever partitioning you choose, there exist a sequence that QuickSort will run on O(N^2). The partitioning you choose only decreases the probability of the worst case to occur. (Random Pivot Selection , Median-Of-Three, etc.)

EDIT: Thanks to @maxim1000 s correction. Quicksort with pivot selection algorithm Median of Medians has O(N log(N)) worst case complexity, but due to the overhead it introduces it isn't used in practice. It shows how good selection algorithm, can change the worst-case complexity through pivot selection, theoretically.


What does IntroSort do?

IntroSort limits the branching of QuickSort. This is the most important point, that limit is 2 * (log N). When limit is reached, IntroSort can use any sorting algorithm that has worst case complexity of O(N log(N)).

Branching stops when we have O(log N) subproblems. We can solve every subproblem O(n log n). (Lower case n stands for the subproblem sizes).

Sum of (n log n) is our worst case complexity, now.

For the worst case of QuickSort; assume we have an already sorted array, and we select always the first element in this array as the pivot. In every iteration we get rid of only the first element. If we went this way until the end, it would be O(N^2) obviously. With IntroSort we stop QuickSort, when we reach a depth log(N), then we use HeapSort for the remaining unsorted array.

16 -> 1  /**N**/
   \
    > 15 -> 1 /**N - 1**/
         \
          > 14 -> 1 /**N - 2**/
               \
                > 13 -> 1 /**N - log(N)**/  
                     \
                      > 12 /**(HeapSort Now) (N - log(N)) log (N - log(N))**/

Sum them up;

Until branching stops, N + (N - 1) + ... + (N - log(N)) operations done. Instead of using gauss to sum up, we can simply say N + (N - 1) + ... + (N - log(N)) < N log(N).

The HeapSort Part is (N - log(N)) log(N - log(N)) < N log(N)

Overall complexity < 2 N log(N).

Since the constants can be omitted, the worst case complexity of IntroSort is O(N log(N)).


Added Info: GCC STL implementation source code is here. Sort function is at line 5461.

Correction: *Microsoft .NET* sort Implementation is IntroSort since 2012. Related information is here.

Sign up to request clarification or add additional context in comments.

14 Comments

The question was a little bit different - how std::sort is implemented in various STL implementations now - did they changed or not? How can you prove that STL's sort it introsort? introsort is well described in wikipedia, though
GCC STL Source Code: gcc.gnu.org/onlinedocs/gcc-4.7.2/libstdc++/api/…. Start from line 5461 for stl::sort
"Quicksort's worst-case performance is always O(n^2)" is incorrect. There is at least one algorithm for pivot selection which deterministically gives a good pivot - en.wikipedia.org/wiki/Median_of_medians. It's rarely used in practice due to constants, but it makes order of complexity for Quicksort O(n*log n).
@maxim1000 thanks for the correction. I added the related information.
What I really wanted in the answer is the analysis of the STL source code from the people familiar with different implementations.
|
22

Browsing the online sources for libstdc++ and libc++, one can see that both libraries use the full gamut of the well-known sorting algorithms from an intro-sort main loop:

For std::sort, there is a helper routine for insertion_sort (an O(N^2) algorithm but with a good scaling constant to make it competitive for small sequences), plus some special casing for sub-sequences of 0, 1, 2, and 3 elements.

For std::partial_sort, both libraries use a version of heap_sort (O(N log N) in general), because that method has a nice invariant that it keeps a sorted subsequence (it typically has a larger scaling constant to make it more expensive for full sorting).

For std::nth_element, there is a helper routine for selection_sort (again an O(N^2) algorithm with a good sclaing constant to make it competitive for small sequences). For regular sorting insertion_sort usually dominates selection_sort, but for nth_element the invariant of having the smallest elements perfectly matches the behavior of selection_sort.

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.