3

I was experimenting with predicates. I tried to implement the predicate for serializing issues in distributed systems. I wrote a simple example where the test function just returns true. I was measuring the overhead, and I stumbled upon this interesting problem. Accessing array in for loop is 10 times slower compared to direct access.

class Test {
    public boolean test(Object o) { return true; }
}

long count = 1000000000l;
Test[] test = new Test[3];
test[0] = new Test();
test[1] = new Test();
test[2] = new Test();
long milliseconds = System.currentTimeMillis();
for(int i = 0; i < count; i++){
    boolean result = true;
    Object object = new Object();
    for(int j = 0; j < test.length; j++){
        result = result && test[j].test(object);
    }
}
System.out.println((System.currentTimeMillis() - milliseconds));

However, the following code is almost 10 times faster. What can be the reason?

milliseconds = System.currentTimeMillis();
for(int i=0 ; i < count; i++) {
    Object object = new Object();
    boolean result = test[0].test(object) && test[1].test(object) && test[2].test(object);
}
System.out.println((System.currentTimeMillis() - milliseconds));

Benchmark results on my i5.

  • 4567 msec for for loop access

  • 297 msec for direct access

5
  • Well, in the first piece of code you have an inner loop, which is always iterating 3 times per iteration of the outer loop. In the second piece of code, no such loop exists and the whole thing can short-circuit right away while the loop itself can't. Commented Nov 28, 2016 at 18:49
  • Just to make things a little more comparable, how about changing the condition of the inner loop in the first piece of code to !result && j < test.length and initializing boolean result = false to have the same chance at short-circuiting? Commented Nov 28, 2016 at 18:52
  • 3
    It looks to me like you've just got a bad microbenchmark and the optimizer is doing something to eliminate most of the work in the second snippet. Commented Nov 28, 2016 at 18:54
  • 1
    Place one approach into a method called foo(), and the other one into a method called bar(), and then call these methods, maybe 20 types, alternatingly, from your main method. For me, after a few calls, the methods will take exactly the same amount of time. (Starting with java -server ... might help). The first method may be a tag more complex, which may be a reason for the JIT to kick in a bit later, but in the end, there is no difference. Commented Nov 28, 2016 at 19:36
  • 2
    A side note: This is not a proper microbenchmark. In both cases, the result variaible is not used, and the compiler could basically eliminate the whole program. And the fact that the result variable is read and written in the first approach, but only written in the second one may be another reason of why it's easier for the optimizer to figure out that the whole code is effectively useless... Commented Nov 28, 2016 at 19:39

3 Answers 3

1

Due to the predictable result of test(Object o) the compiler is able to optimize the second piece of code quite effectively. The second loop in the first piece of code makes this optimization impossible.

Compare the result with the following Test class:

static class Test {
    public boolean test(Object o) {
        return Math.random() > 0.5;
    }
}  

... and the loops:

    long count = 100000000l;
    Test[] test = new Test[3];
    test[0] = new Test();
    test[1] = new Test();
    test[2] = new Test();

    long milliseconds = System.currentTimeMillis();

    for(int i = 0; i < count; i++){
        boolean result = true;
        Object object = new Object();
        for(int j = 0; j < test.length; j++){
            result = result && test[j].test(object);
        }
    }

    System.out.println((System.currentTimeMillis() - milliseconds));
    milliseconds = System.currentTimeMillis();

    for(int i=0 ; i < count; i++) {
        Object object = new Object();
        boolean result = test[0].test(object) && test[1].test(object) && test[2].test(object);
    }

    System.out.println((System.currentTimeMillis() - milliseconds));

Now both loops require almost the same time:

run:
3759
3368
BUILD SUCCESSFUL (total time: 7 seconds)

p.s.: check out this article for more about JIT compiler optimizations.

Sign up to request clarification or add additional context in comments.

2 Comments

While this is true, I'm not sure whether it is +1worthy: The crucial question is (or might be) why it manages to optimize the original code better in one case than in the other. (I made some guesses in the comments, no time for a proper answer right now)
I've to admit that it was rather my gut feeling leading me to the answer above. Compare for instance this post: stackoverflow.com/questions/7772864/…. The author of the accepted answer supposes that the (similar) optimization is done by the JIT compiler. Root cause is definitely not the additional loop.
1

You are committing almost every basic mistake you can make with a microbenchmark.

  • You don't ensure code cannot be optimized away by making sure to actually use the calculations result.
  • Your two code branches have subtly but decidedly different logic (as pointed out variant two will always short-circuit). The second case is easier to optimize for the JIT due to test() returning a constant.
  • You did not warm up the code, inviting JIT optimization time being included somewhere into the execution time
  • Your testing code is not accounting for execution order of test cases exerting an influence on the test results. Its not fair to run case 1, then case 2 with the same data and objects. The JIT will by the time case 2 runs have optimized the test method and collected runtime statistics about its behavior (at the expense of case 1's execution time).

2 Comments

Why does variant 2 short circuit? All of them are true and needs to be evaulated? Am I missing something?
@Mustafa My statement is incorrect (must have gotten the idea from a comment), but the effect is real: since you return a constant from test(), the entire line expression becomes a constant after inlining by the JIT.
0

If loop header takes one unit time to execute the in first solution loop header evaluations takes 3N units of time. While in direct access it takes N.

Other than loop header overhead in first solution 3 && conditions per iteration to evaluate while in second there are only 2.

And last but not the least Boolean short-circuit evaluation which causes your second, faster example, to stop testing the condition "prematurely", i.e. the entire result evaluates to false if first && condition results false.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.