Is performance of std::rotate hampered by the iterator interface?

Question

While standard algorithms should be as optimized as possible based on the iterator category, it seems to me like a lot of performance is left on the table by not being able to consider the underlying container.

The best example I can think of is std::rotate which has to be implemented using element-wise swaps leading to O(n) runtime, where n = distance(first, last), while the underlying containers might provide a better interface:

std::(forward_)list - O(1) using splice for all cases
std::deque - O(d) where d = min(distance(first, middle), distance(middle, last)) using repeated pop_front and push_back (or other way around) when rotating entire container, so O(1) for common case of rotating a single step
std::vector - technically always O(n) due to removal or insertion at beginning, but might be better optimized for trivially copyable types by copying multiple elements at once

Is my reasoning correct here? Can the cases above be optimized with std::ranges::rotate?

a range implementation could do something like this, not sure if it will. the standard doesn't always provide the "best" solution, as best is subjective, but it does provide many generic algorithms that work with a wide range of containers. — NathanOliver
– NathanOliver, Commented Jun 21 at 0:38
Regarding std::deque this only applies for the full container, but std::rotate can take any sub-slice of the deque. Insertion in a deque is done in O(n) so in this case, the complexity of std::rotate is fine. To perform an optimization for the std::deque (or even a std::list), std::rotate need to know more than its current interface: a ForwardIt (or the better LegacyBidirectionalIterator or the even better LegacyRandomAccessIterator). — Jérôme Richard
– Jérôme Richard, Commented Jun 21 at 0:50
The key point is that interfaces have a strong impact on performance. While a specialization can be done for full-range iterators on std::deque for example, it would not work on a custom data structure with the same properties unless someone make a specialization of std::rotate (time consuming and bug prone). Then the question is what kind of interface can be designed so std::rotate can achieve good performance while not being only useful for this (rather rarely used) functions. — Jérôme Richard
– Jérôme Richard, Commented Jun 21 at 0:55
std::rotate also has the 'extra' effect of existing iterators now pointing to the 'rotated' values. If you don't swap the values, then the assertion in auto it = list.begin(); std::rotate(list.begin(), std::next(list.begin()), list.end()); assert(it == list.begin()); would fail. This is also why std::iter_swap can't be specialised to be highly efficient for list iterators. — Artyer
– Artyer, Commented Jun 21 at 11:02
And there are other gaps. Until contiguous_iterator_concept, there was no way to specify that your random access container was just a pointer range and could use pointer methods. And std::distance must use difference_type dist = 0; while (it != end) { ++it; ++dist; } return dist; if your iterators are not random access (which requires an O(1) operator-), even if you have a container like a skip list where distance could be calculated faster. Looking for optimisations you'll find that you need to use bespoke methods for so many container, so can't use the standard library algorithms. — Artyer
– Artyer, Commented Jun 21 at 11:08

Wisblade · Accepted Answer · 2025-06-23 09:42:58Z

The STL generally provides the best generic algorithms currently available. A common misunderstanding is to think they are the best, period, which is largely false...

Most STL containers or algorithms can easily be outperformed in terms of speed or memory usage, provided you lock down certain aspects specific to your needs: fixed size known at compile time (allows static allocation), particular use cases (e.g. circular buffer of size 2^N, where the index can wrap using a binary AND), specific data types (enabling tweaks in alignment, allocation, etc.), operation without mutexes, and who knows what else. And this remains true even when using STL specializations, custom allocators, traits, ranges, or any other technique that lets you "finely" tune the containers used. The underlying issue is that the added code is not always "simple", nor necessarily optimized correctly during compilation. And the worst part is that you might waste an infernal amount of time trying to customize the STL and benchmarking your attempts, only to give up and fall back to plan B - custom code.

In short, you're trading off generality, adaptability, interface multiplicity, and overall maintainability in exchange for increased performance and/or reduced memory footprint, with the (potentially major) drawback of reduced or even near-zero maintainability. The gain might be negligible, even invisible to the naked eye (e.g. config file parsing), or critical (20% faster execution in a critical loop). It's your project that will determine whether optimization makes sense. If your custom code lies in a "buried" part of the project, never touched and testable very thoroughly (and thus, likely bug-free), then having specific code isn't a problem: back in the day, people even inserted Assembly code directly for this reason, code that probably still exists and will never be touched again.

But what is certain is that trying to optimize from the start is a very bad idea: in 95% of cases, you'll waste time (and thus money), sacrifice maintainability (costing even more money if fixes or evolutions are needed), for an anecdotal gain. So do your job with standard STL elements, and IF a performance issue is identified, then consider the specific algorithm that might (or might not...) solve it.

Because it's crucial to remember that these kinds of optimizations are, in general, MARGINAL – except in critical loops executed millions or billions of times. The real optimizations, the ones that yield an order-of-magnitude improvement, come from changing the underlying algorithm, not just the containers used to implement it.

Collectives™ on Stack Overflow

Is performance of std::rotate hampered by the iterator interface?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related