7

I was curious to see whether sorting a vector <vector<int>> would be slower than sorting a vector <array <int, 3>>. The dimensions of the vector is 1000000 by 3, and below is my driver code implementing this:

#include <iostream>
#include <fstream>
#include <cmath>
#include <algorithm>
#include <vector>

using namespace std;

int main()
{
    vector <vector<int>> v(1000000, vector <int> (3));

    srand(time(nullptr));
    for(int i = 0; i < 1000000; ++i){
        for(int j = 0; j < 3; ++j){
            v[i][j] = rand();
        }
    }

    double start = clock();
    sort(v.begin(), v.end());
    cout << (clock()-start)/(CLOCKS_PER_SEC/1000) << endl;

    return 0;
}

Compiling with g++ -O3 sorting_test.cxx with gcc 7.5.0, I get a runtime of around 300 ms. Declaring v as a vector <array <int, 3>> halved the runtime to around 149 ms.

However, declaring v as a vector <tuple<int, int, int>> beat out both of the above options, with an average runtime of approximately 100 ms.

I can somewhat understand why the array option is faster than the vector option (array size is a constant expression, unlike the vector), but I have no idea why the tuple would beat both of them. Can somebody please explain this to me?

The code that fills the tuple <int, int, int>s is

srand(time(nullptr));
for(int i = 0; i < 1000000; ++i){
    get <0> (v[i]) = rand();
    get <1> (v[i]) = rand();
    get <2> (v[i]) = rand();
}
8
  • 3
    My guess it has to do with how operator < is defined for vector, array, and tuple. With vector and array you need a loop. tuple probably uses a fold operation which while it has the same number of comparisons, does not have the loop overhead. Commented Oct 6, 2020 at 19:07
  • 1
    Show the code that fills the vector of tuples. Also it might be better to srand(0) to have repeatable results. Commented Oct 6, 2020 at 19:08
  • 1
    See this for what a fold expression is. Commented Oct 6, 2020 at 19:10
  • 1
    Also, a vector points into a dynamically allocated memory which is much worse for cache utilization. Vector of arrays stores all the data contiguously. Moreover, swapping two vectors involve 48 bytes (on 64bit arch), while swapping of arrays only a half in this case. Commented Oct 6, 2020 at 19:10
  • 1
    Internally swaps are performed. Maybe the amount of memory to be swapped is just lower in the tuple case Commented Oct 6, 2020 at 19:12

1 Answer 1

8

While the disassembly for the whole programs are too large, this demonstrates the core difference between operator< for array and tuple: https://godbolt.org/z/h1Y33e

Essentially, in the tuple version, you have a fixed comparison of 3 elements whereas in the array version, you have a loop.

Though I'm surprised that the compiler did not unroll the loop.

Edit: looks like clang does optimize them to both, non-loop code: https://godbolt.org/z/cMExTb (I did not fully read it, but I only see forward jumps)

Sign up to request clarification or add additional context in comments.

3 Comments

Might be interesting also to compare assembly for swap operations: godbolt.org/z/sGsK7Y.
array should be equal to tuple in perf in this specific case (small array) but may lose if the array is larger.
For the clang asm, the array and tuple versions have the exact same instructions, but because libstdc++'s std::tuple's layout is backwards, they read the three ints in a different order.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.