I'm trying to test the cache properties of a machine I have access to. To do this I am trying to read memory and time it. I vary the working set size and the stride access pattern to get different measurements.
The code looks like so:
clock1 = get_ticks()
for (i = 0; i < 1000000; i++) {
for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
*array[j];
}
}
clock2 = get_ticks()
Now the issue is that with a reasonable optimization level, gcc will optimize out the read because it has no side effect. I can't have no optimization level or else all the loop variables will cause reads to memory. I've tried a few different things like making array volatile, and using inline functions that cast as volatile, but gcc's treatment of volatile variables is very hard to predict. What is the appropriate way to do this?
asm {}block and compile the whole program.