The current loop is:
#define N 3000
...
int i, j;
int a[N][N], b[N][N], c[N];
// Fill in b and c with random values
for (i = 0; i < n; ++i) {
for (j = 0; j < n; ++j) {
a[i][j] = b[i][j] / c[i];
}
}
My optimized version unrolls both outer and inner loop:
for (int i = 0; i < N; i += 2) {
for (int j = 0; j < N; j += 2) {
a[i][j] = b[i][j] / c[i];
a[i][j + 1] = b[i][j + 1] / c[i];
a[i + 1][j] = b[i + 1][j] / c[i + 1];
a[i + 1][j + 1] = b[i + 1][j + 1] / c[i + 1];
}
}
However, my instructor said that the second loop is not optimized very well. The indication to c(i) should be taken out of the loop over j. The loop is optimized by changing the order of indices. This way you make one sweep over the memory in the inner loop instead of zigzag-type of searches.
I am still not sure what he means since changing the order of indices would still make the loop traverse in a zig-zag type. What should be the correct solution for this case?
j, and the loop oniis not of much concern.