Nested loop performance inner most loop one interation

Question

If have a vector which is most of the time pseudo 3D but it can also behave as pseudo 4D. With pseudo 3D/4D I mean that the 3/4 dimensions are not stored as 3/4 different arrays but all in one long array. So there is some conversion from 3/4 indices to a row. This is achieved by an overload so.

T vector::at(int i, int j, int k, int s = 0){
    row = i*NJ*NK*NS + j*NK*NS + k*NS + s;
    return vector[row];
}

With NI, NJ, NK, NS the size in the i, j, k, s dimension respectively.

I need to loop over all the elements and set the values of another vector type (without the overload) to my vector with overload:

int n=0;
for (int i=0; i<NI; ++i)
    for (int j=0; j<NJ; ++j)
        for (int k=0; k<NK; ++k)
            for(int s=0; s<NS; ++s)
                myvector(i, j, k, s) = othervector[n++];

Now what I'm wondering is: would this be very inefficient if most of times NS=1. In this question I saw that today's CPU are heavily optimized for linear access to memory. That would in this example still be the case, but my feeling says that the loop would be very inefficient if the most inner loop is actually only one iteration all the time. Any ideas whether this is very undesirable from a performance point of view?

(I could do the following but I'm just wondering if the compiler/CPU is smart enough to make it efficient itself:

n=0;
if (NS>1){
for (int i=0; i<NI; ++i)
    for (int j=0; j<NJ; ++j)
        for (int k=0; k<NK; ++k)
            for(int s=0; s<NS; ++s)
                myvector(i, j, k, s) = othervector[n++];
} else {
for (int i=0; i<NI; ++i)
    for (int j=0; j<NJ; ++j)
        for (int k=0; k<NK; ++k)
            myvector(i, j, k) = othervector[n++];
}

)

There's a good chance that a modern c++ compiler catches that pattern for optimization. Compilers usually do loop unrolling of all kinds, if there's enough information in the code to optimize it that way. At least Duff's device could be applied here. — πάντα ῥεῖ
– πάντα ῥεῖ, Commented Dec 13, 2020 at 16:13

ytlu · Accepted Answer · 2020-12-13 17:09:29Z

1

For the whole array copying, using std::copy_n (or memcpy) is much more efficient:

std::copy_n(othervector_pointer, (NI*NJ*NK*Ns), myvector_pointer);

You need to include `agorithm` header. Or memcpy in `cstring`:

memcpy(myvector_pointer, othervector_pointer, (NI*NJ*NK*NS));

answered Dec 13, 2020 at 17:09

ytlu

4124 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Nested loop performance inner most loop one interation

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related