0

I am currently implementing selectionsort. Using the code below and the driver file to test it below. I am currently trying to do micro optimizations to see what speeds it up.

  public static void selectionsort(int[] data) {
    for (int i = 0; i < data.length; i++) {
      int minx = i;  
      for (int j = i+1; j < data.length; j++) {
          if(data[j] < data[minx]){
            minx = j;
          }
          // else{
          //   minx=minx;
          // }
        }
      int swap = data[minx]; 
      data[minx] = data[i]; 
      data[i] = swap; 
    }
  }
import java.util.Arrays;
import java.util.Random;

public class Driver {
    public static void main(String[] args) {
        Random r = new Random(12);
        // CODE-BLOCK-A
        // run selection sort version of tests
        double avg = 0;
        for (int i = 0; i < 100; i++) {
            long time = System.currentTimeMillis();
        int[] sgarr = randIntArr(Integer.parseInt(args[0]),r);
        int[] tests = Arrays.copyOf(sgarr, sgarr.length);
        switch (args[1]) {
            case "selection":
                Sorts.selectionsort(sgarr);
                break;
            case "bubble":
                Sorts.bubblesort(sgarr);
            default:
                Sorts.insertionsort(sgarr);
                break;
        }
        Arrays.sort(tests);
        if (!equals(tests, sgarr)) {
            System.out.println("FAIL TO SORT!");
            System.exit(0);
        }
        // System.out.println("SUCCESS!");
    

        long etime = System.currentTimeMillis();
        avg += (etime-time);
        }

    System.out.println("Took "+ avg / (100.0*1000.0) +" seconds");

    }

    public static int[] randIntArr(int count,Random r) {
        int seed = r.nextInt(12433);
        r.setSeed(seed);
        int[] arr = new int[count];
        for (int i = 0; i < count; i++) {
            arr[i] = (int) (r.nextDouble() * 100000);
        }
        return arr;
    }

    public static boolean equals(int[] d1, int[] d2) {
        if (d1.length != d2.length) {
            return false;
        }
        for (int i = 0; i < d2.length; i++) {
            if (d1[i] != d2[i]) {
                return false;
            }
        }

        return true;
    }
}

What I have tried doing to speed it up is using an else block that should do nothing. I expected this to slow it down as now it is checking an if statement every time like normal but also assigning minx to itself which should slow it down. But if I put an empty else statement the improvements go away.

However when I run it using this command, when the else block is commented out I get:

javac Driver.java 
&& java Driver 40000 selection
Took 0.34416 seconds
However, when I uncomment out the else block and run the same command it gets noticeably faster
Took 0.30086 seconds

Meaning for the best case scenarios on both it is about a 13.43% difference.

I am assuming that this change is significant enough to not just be random considering I also ran each one multiple times. I then asked my CS teacher about it and he explained it is probably branch prediction. Which does make sense however what I wonder is if I get the same performance enhancements without an essentially useless else statement or if I can't what is considered better practice to keep or to remove it. Or is 13.43% significant, I don't believe it is due to my computer because I ran it many times and the values were within 3% of each time. This being ran on x86, if that matters.


Tested currently using this main file

package org.example;



import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class Main {

    @Param({"100", "1000", "10000"}) // You can add more values for array sizes
    private int arraySize;

    private int[] randomArray;

    @Setup(Level.Iteration)
    public void setup() {
        // Initialize the array with random data
        randomArray = createRandomArray(arraySize);
    }

    @Benchmark
    public void selectionSortBenchmark() {
        Sorts.selectionsort(randomArray.clone());
    }

    private int[] createRandomArray(int size) {
        int[] array = new int[size];
        for (int i = 0; i < size; i++) {
            array[i] = (int) (Math.random() * 1000); // Adjust the range as needed
        }
        return array;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(Main.class.getSimpleName())
                .forks(1)
                .warmupIterations(5)
                .measurementIterations(5)
                .build();

        new Runner(opt).run();
    }
}
8
  • minx=minx; could (and hopefully should) optimize away to nothing, same as if there was no else branch. But perhaps the different code is enough to get the JVM to JIT-compile to different machine code. Possibly branchy vs. branchless (like x86 cmov or AArch64 csel). Commented Nov 30, 2023 at 3:33
  • That's what I would think to do but it doesn't having the minx=minx; makes it faster, having and empty else and no else are the same speed. That's what confuses me. Commented Nov 30, 2023 at 3:35
  • 1
    Ok, empty else being the same as no else makes sense; that might be removed even during javac compilation to bytecode. Oh, your timing doesn't include any warm-up, so it includes runs where the JVM is just interpreting the bytecode before deciding to make fully optimized asm based on profiling data it gathered from those runs. The = is probably there in the bytecode. Normally you'd use JMH or something to only time optimized asm, since that's what long-running Java programs spend most of their time doing. How do I write a correct micro-benchmark in Java? Commented Nov 30, 2023 at 3:39
  • Thanks I am going to try that out I'll add the findings to the post in an edit. That could be the issue. Commented Nov 30, 2023 at 3:42
  • 1
    I have an i7-11800H, Tiger Lake, tests are still running but it does seem it was a test issue. Regardless I'll be sure to look into the sources you game seems interesting. Commented Nov 30, 2023 at 4:02

1 Answer 1

1

The reason as @PeterCordes said in the comments was a testing issue when using correct testing software like JMH all the data points are very similar and definitely within my computer have more or less resources available or some other randomness-ish. Here are the results when using JMH:

Commented out else block
Main.selectionSortBenchmark          100  avgt    5   0.003 ± 0.001  ms/op
Main.selectionSortBenchmark         1000  avgt    5   0.266 ± 0.050  ms/op
Main.selectionSortBenchmark        10000  avgt    5  19.087 ± 0.702  ms/op
Not commented out else block
Main.selectionSortBenchmark          100  avgt    5   0.003 ±  0.001  ms/op
Main.selectionSortBenchmark         1000  avgt    5   0.234 ±  0.010  ms/op
Main.selectionSortBenchmark        10000  avgt    5  18.468 ±  3.204  ms/op
Empty else block
Main.selectionSortBenchmark          100  avgt    5   0.003 ±  0.001  ms/op
Main.selectionSortBenchmark         1000  avgt    5   0.273 ±  0.005  ms/op
Main.selectionSortBenchmark        10000  avgt    5  20.022 ±  2.740  ms/op
Sign up to request clarification or add additional context in comments.

4 Comments

Ok good, that's the expected result. Your test in the question is almost certainly measuring a real effect that's statistically significant, but only during the warm-up phase. So real or not, it's probably not interesting except for people analyzing details of the JVM. And no reason to expect it to occur in other programs with different surrounding code or different if/else bodies, or it could swing the other way and get lucky with the no-else version being faster. It might or might not also be specific to Intel CPUs, or to Intel Ice/Tiger Lake CPUs.
I've been trying to understand the warm-up phase and what it means I get that it means where the compiler is finding hotspots, changing allocation resources and optimizing itself. But what I ask is let's say I run it with an array of 100mil len then I run a 5k len array. Would the 5k len array be sorted faster with the 100 mil len array warmup or 10 15k len array iterations. Or I am missing what the warm up phase is entirely.
The JVM might decide to stop in the middle of the first array and generate optimized machine code (JIT compile) that function after the loop had been running for a long time, like multiple seconds. I think normally it decides when to make an optimized version of a function as its being called, but most functions that take ages might only get called once in the life of a long-running program. (And an O(N^2) sort on 100M elements would take a fair while.) But yeah, the second call (for the 5k array) will definitely be running fully optimized machine code.
Some relevant articles: developers.redhat.com/articles/2021/06/23/… / en.wikipedia.org/wiki/Just-in-time_compilation and a Q&A How does the JVM decided to JIT-compile a method (categorize a method as "hot")? . I've just skimmed those things that came up high in my google search results, but they look reasonable (unlike a couple other hits that I skimmed which didn't look as useful.) The current HotSpot JVM is the result of many years of development work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.