Why is sorting a std::vector of std::tuple's faster than sorting a vector of std::arrays?

Question

I was curious to see whether sorting a vector <vector<int>> would be slower than sorting a vector <array <int, 3>>. The dimensions of the vector is 1000000 by 3, and below is my driver code implementing this:

#include <iostream>
#include <fstream>
#include <cmath>
#include <algorithm>
#include <vector>

using namespace std;

int main()
{
    vector <vector<int>> v(1000000, vector <int> (3));

    srand(time(nullptr));
    for(int i = 0; i < 1000000; ++i){
        for(int j = 0; j < 3; ++j){
            v[i][j] = rand();
        }
    }

    double start = clock();
    sort(v.begin(), v.end());
    cout << (clock()-start)/(CLOCKS_PER_SEC/1000) << endl;

    return 0;
}

Compiling with g++ -O3 sorting_test.cxx with gcc 7.5.0, I get a runtime of around 300 ms. Declaring v as a vector <array <int, 3>> halved the runtime to around 149 ms.

However, declaring v as a vector <tuple<int, int, int>> beat out both of the above options, with an average runtime of approximately 100 ms.

I can somewhat understand why the array option is faster than the vector option (array size is a constant expression, unlike the vector), but I have no idea why the tuple would beat both of them. Can somebody please explain this to me?

The code that fills the tuple <int, int, int>s is

srand(time(nullptr));
for(int i = 0; i < 1000000; ++i){
    get <0> (v[i]) = rand();
    get <1> (v[i]) = rand();
    get <2> (v[i]) = rand();
}

My guess it has to do with how operator < is defined for vector, array, and tuple. With vector and array you need a loop. tuple probably uses a fold operation which while it has the same number of comparisons, does not have the loop overhead. — NathanOliver
– NathanOliver, Commented Oct 6, 2020 at 19:07
Show the code that fills the vector of tuples. Also it might be better to srand(0) to have repeatable results. — rustyx
– rustyx, Commented Oct 6, 2020 at 19:08
Also, a vector points into a dynamically allocated memory which is much worse for cache utilization. Vector of arrays stores all the data contiguously. Moreover, swapping two vectors involve 48 bytes (on 64bit arch), while swapping of arrays only a half in this case. — Daniel Langr
– Daniel Langr, Commented Oct 6, 2020 at 19:10
Internally swaps are performed. Maybe the amount of memory to be swapped is just lower in the tuple case — Damien
– Damien, Commented Oct 6, 2020 at 19:12

Fatih BAKIR · Accepted Answer · 2020-10-06 19:16:47Z

8

While the disassembly for the whole programs are too large, this demonstrates the core difference between operator< for array and tuple: https://godbolt.org/z/h1Y33e

Essentially, in the tuple version, you have a fixed comparison of 3 elements whereas in the array version, you have a loop.

Though I'm surprised that the compiler did not unroll the loop.

Edit: looks like clang does optimize them to both, non-loop code: https://godbolt.org/z/cMExTb (I did not fully read it, but I only see forward jumps)

edited Oct 6, 2020 at 19:16

answered Oct 6, 2020 at 19:12

Fatih BAKIR

4,9301 gold badge25 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Daniel Langr Over a year ago

Might be interesting also to compare assembly for swap operations: godbolt.org/z/sGsK7Y.

Waqar Over a year ago

array should be equal to tuple in perf in this specific case (small array) but may lose if the array is larger.

Justin Over a year ago

For the clang asm, the array and tuple versions have the exact same instructions, but because libstdc++'s std::tuple's layout is backwards, they read the three ints in a different order.

Collectives™ on Stack Overflow

Why is sorting a std::vector of std::tuple's faster than sorting a vector of std::arrays?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related