0

The current loop is:

#define N 3000
...
int i, j;
int a[N][N], b[N][N], c[N]; 
// Fill in b and c with random values

for (i = 0; i < n; ++i) {
  for (j = 0; j < n; ++j) {
    a[i][j] = b[i][j] / c[i];
  }
}

My optimized version unrolls both outer and inner loop:

for (int i = 0; i < N; i += 2) {
    for (int j = 0; j < N; j += 2) {
      a[i][j] = b[i][j] / c[i];
      a[i][j + 1] = b[i][j + 1] / c[i];
      a[i + 1][j] = b[i + 1][j] / c[i + 1];
      a[i + 1][j + 1] = b[i + 1][j + 1] / c[i + 1];
    }
  }

However, my instructor said that the second loop is not optimized very well. The indication to c(i) should be taken out of the loop over j. The loop is optimized by changing the order of indices. This way you make one sweep over the memory in the inner loop instead of zigzag-type of searches.

I am still not sure what he means since changing the order of indices would still make the loop traverse in a zig-zag type. What should be the correct solution for this case?

4
  • This is unclear: “The loop is optimized by changing the order of indices.” Is it a comment about the original code (the first code shown in the question) or “My optimized version” (the second code shown in the question)? It does look like the code is such that it would generally be better to unroll the loop on j, and the loop on i is not of much concern. Commented Mar 1, 2020 at 22:28
  • Unrolling the "i" portion forces many more cache misses. Better to unroll the "j" portion due to the memory layout. Commented Mar 2, 2020 at 0:10
  • @technosaurus You mean switching j to be the outer loop? Commented Mar 2, 2020 at 7:38
  • No, that would be way worse. Omit the i+1 unrolling. It causes a cache miss on every single loop. For a multidimensional array the inner loop should be over the right-most index. Unrolling more than a cache line size rarely shows improvements (usually ~64 bytes) but unrolling at least 16 bytes of data access can sometimes help the compiler use SIMD instructions. Commented Mar 3, 2020 at 16:23

2 Answers 2

2

Put int ci = c[i]; in the outer loop, and inner loop divides by ci. Note that any reasonable compiler will do this for you.

Sign up to request clarification or add additional context in comments.

3 Comments

Also, if this were FORTRAN, inverting the loop order would be appropriate.
One should not assume a compiler will do this for you. The code in the question is a simplified example. In practice, loops like this often appear in a routine in a separate translation unit. In such a routine, the compiler cannot know at compile time whether the c it was passed is the same as any of the rows of a, and therefore it cannot know whether some a[i][j] = … changes c[i], and so an optimization to treat c[i] as a loop invariant would be an error. (This can be overcome by inserting code to test the addresses at run-time, which either the programmer or the compiler could do.)
It is, but I am looking for away to improve the loop by changing the order of indices like my instructor said.
2

I'm not sure what your instructor is looking for, but you can make use of a reasonably well-known C technique known as Duff's Device here to help speed up your unrolled loop:

  init_arrays();

  precomputed_n = (N + 7) / 8;

  for(i = 0 ; i < N ; ++i)
    {
    to = a[i];
    from = b[i];
    ci = c[i];

    n = precomputed_n;

    switch(N % 8)
      case 0: do { *to++ = *from++ / ci;
      case 7:      *to++ = *from++ / ci;
      case 6:      *to++ = *from++ / ci;
      case 5:      *to++ = *from++ / ci;
      case 4:      *to++ = *from++ / ci;
      case 3:      *to++ = *from++ / ci;
      case 2:      *to++ = *from++ / ci;
      case 1:      *to++ = *from++ / ci;
                 } while (--n > 0);
    }

Duff's Device is a handy way to unroll loops which combines a while loop and a switch.

Try it online!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.