The STL generally provides the best generic algorithms currently available. A common misunderstanding is to think they are the best, period, which is largely false...
Most STL containers or algorithms can easily be outperformed in terms of speed or memory usage, provided you lock down certain aspects specific to your needs: fixed size known at compile time (allows static allocation), particular use cases (e.g. circular buffer of size 2^N, where the index can wrap using a binary AND), specific data types (enabling tweaks in alignment, allocation, etc.), operation without mutexes, and who knows what else. And this remains true even when using STL specializations, custom allocators, traits, ranges, or any other technique that lets you "finely" tune the containers used. The underlying issue is that the added code is not always "simple", nor necessarily optimized correctly during compilation. And the worst part is that you might waste an infernal amount of time trying to customize the STL and benchmarking your attempts, only to give up and fall back to plan B - custom code.
In short, you're trading off generality, adaptability, interface multiplicity, and overall maintainability in exchange for increased performance and/or reduced memory footprint, with the (potentially major) drawback of reduced or even near-zero maintainability. The gain might be negligible, even invisible to the naked eye (e.g. config file parsing), or critical (20% faster execution in a critical loop). It's your project that will determine whether optimization makes sense. If your custom code lies in a "buried" part of the project, never touched and testable very thoroughly (and thus, likely bug-free), then having specific code isn't a problem: back in the day, people even inserted Assembly code directly for this reason, code that probably still exists and will never be touched again.
But what is certain is that trying to optimize from the start is a very bad idea: in 95% of cases, you'll waste time (and thus money), sacrifice maintainability (costing even more money if fixes or evolutions are needed), for an anecdotal gain. So do your job with standard STL elements, and IF a performance issue is identified, then consider the specific algorithm that might (or might not...) solve it.
Because it's crucial to remember that these kinds of optimizations are, in general, MARGINAL – except in critical loops executed millions or billions of times. The real optimizations, the ones that yield an order-of-magnitude improvement, come from changing the underlying algorithm, not just the containers used to implement it.
std::dequethis only applies for the full container, butstd::rotatecan take any sub-slice of the deque. Insertion in a deque is done inO(n)so in this case, the complexity ofstd::rotateis fine. To perform an optimization for thestd::deque(or even astd::list),std::rotateneed to know more than its current interface: aForwardIt(or the betterLegacyBidirectionalIteratoror the even betterLegacyRandomAccessIterator).std::dequefor example, it would not work on a custom data structure with the same properties unless someone make a specialization ofstd::rotate(time consuming and bug prone). Then the question is what kind of interface can be designed sostd::rotatecan achieve good performance while not being only useful for this (rather rarely used) functions.std::rotatealso has the 'extra' effect of existing iterators now pointing to the 'rotated' values. If you don't swap the values, then the assertion inauto it = list.begin(); std::rotate(list.begin(), std::next(list.begin()), list.end()); assert(it == list.begin());would fail. This is also whystd::iter_swapcan't be specialised to be highly efficient for list iterators.contiguous_iterator_concept, there was no way to specify that your random access container was just a pointer range and could use pointer methods. Andstd::distancemust usedifference_type dist = 0; while (it != end) { ++it; ++dist; } return dist;if your iterators are not random access (which requires an O(1) operator-), even if you have a container like a skip list where distance could be calculated faster. Looking for optimisations you'll find that you need to use bespoke methods for so many container, so can't use the standard library algorithms.