Controlling GCC optimization

Question

I'm trying to test the cache properties of a machine I have access to. To do this I am trying to read memory and time it. I vary the working set size and the stride access pattern to get different measurements.

The code looks like so:

clock1 = get_ticks()
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    *array[j];
  }
}
clock2 = get_ticks()

Now the issue is that with a reasonable optimization level, gcc will optimize out the read because it has no side effect. I can't have no optimization level or else all the loop variables will cause reads to memory. I've tried a few different things like making array volatile, and using inline functions that cast as volatile, but gcc's treatment of volatile variables is very hard to predict. What is the appropriate way to do this?

Not sure how feasible this is: compile to assembler, with no optimization, the code you want to time, then replace the C code with an asm {} block and compile the whole program. — pmg
– pmg, Commented Mar 16, 2011 at 22:42
Ulrich Drepper's "What Every Programmer Should Know About Memory" ( akkadia.org/drepper/cpumemory.pdf) includes some benchmarks for cache properties. — ninjalj
– ninjalj, Commented Mar 16, 2011 at 23:47
Tongue in cheek: make sure that the loop computes the solution to a an open math problem so that the compiler can't optimize it out: blog.regehr.org/archives/140 — Pascal Cuoq
– Pascal Cuoq, Commented Mar 17, 2011 at 0:05

Paul R · Accepted Answer · 2011-03-16 22:32:18Z

2

One possibility is to make use of the array data in a way that can't easily be optimised away, e.g.

clock1 = get_ticks();
sum = 0;
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    sum += array[j];
  }
}
clock2 = get_ticks();
return sum;

sum should be in a register, and the add operation should add nothing significant to the loop timing.

If the test function and caller are both in the same compilation unit you may also need to ensure that you actually do something with the returned sum value, e.g. output it via printf.

answered Mar 16, 2011 at 22:32

Paul R

214k38 gold badges402 silver badges579 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

dschatz Over a year ago

GCC will optimize that out. For example, gcc can just multiply the value of sum after one run of the inner loop by 1000000 and it will be correct.

Paul R Over a year ago

@dschatz: you may need to ensure you actually do something with the returned sum value, e.g. output it via printf.

dschatz Over a year ago

Outputting it won't matter anyhow, it can optimize the two loops and get the value.

Paul R Over a year ago

@dschatz: OK - I see the problem now - I was looking at a different optimisation. I think you may need to make the array both global and volatile.

EboMike Over a year ago

@dschatz: It won't be able to optimize it if array is volatile.

Martin Babacaev · Accepted Answer · 2011-03-17 00:55:53Z

1

For GCC try to specify used attribute for all index variables (i, j), in order to avoid compiler optimization on them (even with global optimization option enabled):

int i __attribute__((used));
int j __attribute__((used));

clock1 = get_ticks()
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    *array[j];
    asm (""); // help to avoid cycle's body elimination
  }
}
clock2 = get_ticks();

Is also good to know, that asm(...) expressions are never being optimized. You can even use it without any assembler expression in it, like this: asm("");.

edited Mar 17, 2011 at 0:55

answered Mar 16, 2011 at 22:53

Martin Babacaev

6,2802 gold badges21 silver badges36 bronze badges

Comments

Mario · Accepted Answer · 2011-03-16 22:55:33Z

0

I think you should really try to write it in assembler if you don't want the compiler to fuzz around with it. You just can't ensure any "tricks" would work forever. Something that works now might be optimized in a future version of the compiler. Also it's probably hard to predict if it worked. If you're able to check the assembler code to see if it worked (i.e. didn't optimize it), you should be able to write it from scratch as well?

answered Mar 16, 2011 at 22:55

Mario

36.7k5 gold badges70 silver badges87 bronze badges

Comments

R.. GitHub STOP HELPING ICE · Accepted Answer · 2011-03-16 23:06:25Z

0

Store the value to a volatile global variable at each iteration. This will ensure that actual writes happen (which are necessary to guarantee that the correct value will be seen in a signal handler, for instance).

Alternatively, use something like

sum += *array[j]^i;

that's simple enough to compute but makes sure the compiler cannot easily optimize out loops with summation formulae.

answered Mar 16, 2011 at 23:06

R.. GitHub STOP HELPING ICE

217k36 gold badges404 silver badges744 bronze badges

Collectives™ on Stack Overflow

Controlling GCC optimization

4 Answers 4

5 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related