I've compiled it with MinGW (TDM) 4.8.1 with option -fdump-tree-optimized, without -O2
The first one do the moves like
string tmp = a+b; // that mean create new string g, g += b, tmp = g (+dispose g)
tmp += c;
return tmp; // and dispose tmp
The second do it in another way
string tmp = a; // just copy a to tmp
tmp += b;
tmp += c;
return tmp; // and dispose tmp
It looks just like this
void * D.20477;
struct basic_string D.20179;
<bb 2>:
D.20179 = std::operator+<char, std::char_traits<char>, std::allocator<char> > (a_1(D), b_2(D)); [return slot optimization]
*_3(D) = std::operator+<char, std::char_traits<char>, std::allocator<char> > (&D.20179, c_4(D)); [return slot optimization]
<bb 3>:
<bb 4>:
std::basic_string<char>::~basic_string (&D.20179);
D.20179 ={v} {CLOBBER};
<L1>:
return _3(D);
<L2>:
std::basic_string<char>::~basic_string (&D.20179);
_5 = __builtin_eh_pointer (1);
__builtin_unwind_resume (_5);
and
void * D.20482;
struct string r [value-expr: *<retval>];
<bb 2>:
std::basic_string<char>::basic_string (r_1(D), a_2(D));
std::basic_string<char>::operator+= (r_1(D), b_3(D));
<bb 3>:
std::basic_string<char>::operator+= (r_1(D), c_4(D));
<bb 4>:
<L0>:
return r_1(D);
<L1>:
std::basic_string<char>::~basic_string (r_1(D));
_5 = __builtin_eh_pointer (1);
__builtin_unwind_resume (_5);
So, after applying -O2 optimization compiler keep ConcatB function in almost same view, and makes some magic with ConcatA by inlining functions, adding constant values to memory allocation parts, declaring new functions, but the most valuable parts stay the same.
ConcatA:
D.20292 = std::operator+<char, std::char_traits<char>, std::allocator<char> > (a_2(D), b_3(D)); [return slot optimization]
*_5(D) = std::operator+<char, std::char_traits<char>, std::allocator<char> > (&D.20292, c_6(D));
ConcatB:
std::basic_string<char>::basic_string (r_3(D), a_4(D));
std::basic_string<char>::append (r_3(D), b_6(D));
std::basic_string<char>::append (r_3(D), c_8(D));
So, it's obvious that ConcatB is better than ConcatA, because it does less allocation operations, which is very expensive when you're trying to optimize such small pieces of code.
ConcatAconstructs more than one string it's probably faster thanConcatBdue to the potential extra realloc and copy in the second.boost::range::join(boost.org/doc/libs/1_55_0/libs/range/doc/html/range/reference/…; it only takes 2 source ranges but it is possible to make a truly multi-range join with c++11).