1

I'm trying to test the cache properties of a machine I have access to. To do this I am trying to read memory and time it. I vary the working set size and the stride access pattern to get different measurements.

The code looks like so:

clock1 = get_ticks()
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    *array[j];
  }
}
clock2 = get_ticks()

Now the issue is that with a reasonable optimization level, gcc will optimize out the read because it has no side effect. I can't have no optimization level or else all the loop variables will cause reads to memory. I've tried a few different things like making array volatile, and using inline functions that cast as volatile, but gcc's treatment of volatile variables is very hard to predict. What is the appropriate way to do this?

3
  • 1
    Not sure how feasible this is: compile to assembler, with no optimization, the code you want to time, then replace the C code with an asm {} block and compile the whole program. Commented Mar 16, 2011 at 22:42
  • 1
    Ulrich Drepper's "What Every Programmer Should Know About Memory" ( akkadia.org/drepper/cpumemory.pdf) includes some benchmarks for cache properties. Commented Mar 16, 2011 at 23:47
  • Tongue in cheek: make sure that the loop computes the solution to a an open math problem so that the compiler can't optimize it out: blog.regehr.org/archives/140 Commented Mar 17, 2011 at 0:05

4 Answers 4

2

One possibility is to make use of the array data in a way that can't easily be optimised away, e.g.

clock1 = get_ticks();
sum = 0;
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    sum += array[j];
  }
}
clock2 = get_ticks();
return sum;

sum should be in a register, and the add operation should add nothing significant to the loop timing.

If the test function and caller are both in the same compilation unit you may also need to ensure that you actually do something with the returned sum value, e.g. output it via printf.

Sign up to request clarification or add additional context in comments.

5 Comments

GCC will optimize that out. For example, gcc can just multiply the value of sum after one run of the inner loop by 1000000 and it will be correct.
@dschatz: you may need to ensure you actually do something with the returned sum value, e.g. output it via printf.
Outputting it won't matter anyhow, it can optimize the two loops and get the value.
@dschatz: OK - I see the problem now - I was looking at a different optimisation. I think you may need to make the array both global and volatile.
@dschatz: It won't be able to optimize it if array is volatile.
1

For GCC try to specify used attribute for all index variables (i, j), in order to avoid compiler optimization on them (even with global optimization option enabled):

int i __attribute__((used));
int j __attribute__((used));

clock1 = get_ticks()
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    *array[j];
    asm (""); // help to avoid cycle's body elimination
  }
}
clock2 = get_ticks();

Is also good to know, that asm(...) expressions are never being optimized. You can even use it without any assembler expression in it, like this: asm("");.

Comments

0

I think you should really try to write it in assembler if you don't want the compiler to fuzz around with it. You just can't ensure any "tricks" would work forever. Something that works now might be optimized in a future version of the compiler. Also it's probably hard to predict if it worked. If you're able to check the assembler code to see if it worked (i.e. didn't optimize it), you should be able to write it from scratch as well?

Comments

0

Store the value to a volatile global variable at each iteration. This will ensure that actual writes happen (which are necessary to guarantee that the correct value will be seen in a signal handler, for instance).

Alternatively, use something like

sum += *array[j]^i;

that's simple enough to compute but makes sure the compiler cannot easily optimize out loops with summation formulae.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.