optimize indexing while inside of a loop

Ask Question

Asked 1 year, 3 months ago

Modified 1 year, 3 months ago

Viewed 144 times

i want to try optimizing this line of code:

for i in 0..len {
    slots[values[i].slot].count += 1;
}

(both of these lists are extremely long and the same size)

i have already optimized it like this(2.5x faster), to acess it in cache:

const PREFETCH_DISTANCE: usize = 32;
for i in (0..len).rev() {
    if i >= PREFETCH_DISTANCE {
        let pfs = values[i - PREFETCH_DISTANCE].slot;
        let fetch = &slots[pfs] as *const Slot as *const i8;
        unsafe { _mm_prefetch(fetch, _MM_HINT_T0) }
    }
    slots[values[i].slot].count += 1;
}

however it is still not as fast, as i was expecting. for example, this code, for the same list length is more than twice as fast, just because it doesn't index anything:

for val in &*arr {
    let ptr = val as *const T;
    let val = unsafe { ptr.read() };
    let dif = integrated_expected_distribution(&val) - start_val;
    debug_assert!(dif >= 0.);
    let slot = (dif * end_mult) as usize;
    debug_assert!(slot < len);
    values.push(Slottable { value: val, slot });
}

Am i missing something else, that i need to preload for that indexing? Or am i missing something, that causes performance issues?

here is a reproducable example:

    let amount = 100_000_000;
    let highest_possible_num = 1_000_000_000;
    let seed = 44;

    let mut rng = rand::rngs::StdRng::seed_from_u64(seed);

    let mut vec2: Vec<usize> = (0..amount)
        .map(|_| rng.gen_range(0..=highest_possible_num))
        .collect();
    let vec1: Vec<usize> = (0..amount).map(|_| rng.gen_range(0..amount)).collect();
    let len = amount;
    const PREFETCH_DISTANCE: usize = 32;
    let loop_ = Instant::now();
    for i in (0..len).rev() {
        if i >= PREFETCH_DISTANCE {
            let pfs = vec1[i - PREFETCH_DISTANCE];
            let fetch = &vec2[pfs] as *const usize as *const i8;
            unsafe { _mm_prefetch(fetch, _MM_HINT_T0) }
        }
        vec2[vec1[i]] += 1;
    }
    let loop_duration = loop_.elapsed();
    println!("Loop time: {:?}", loop_duration);

the removed count and slot are each just part of a datastructure of 8 byte, so the change shouldn't have any effect on the example

Edit: i think instruction-level parallelism can't be done on the loop, so it becomes slower.

Edit2: i've tested some more and Edit1 was wrong. This code is waaay faster than my prefetched code:

    let mut test = 0;
    for i in (0..len).rev() {
        test += values[i].slot;
    }
    println!("test: {}", test);

edited Jul 30, 2024 at 21:20

asked Jul 28, 2024 at 20:58

Per

357 bronze badges

Please provide a Minimal Reproducible Example. For example, does LLVM knows the lengths are equal and are len?

Chayim Friedman
– Chayim Friedman

2024-07-28 21:05:41 +00:00
Commented Jul 28, 2024 at 21:05
Also, are there any patterns in your data?

Chayim Friedman
– Chayim Friedman

2024-07-28 21:06:52 +00:00
Commented Jul 28, 2024 at 21:06
1

Prefetching does not improve it 2x for me, only marginally (but it does improve). I tried to unroll the loop but no avail. Perhaps with scatter and gather instructions you can do better, but that requires AVX-512.

Chayim Friedman
– Chayim Friedman

2024-07-30 11:07:48 +00:00
Commented Jul 30, 2024 at 11:07
1

Changing prefetch distance makes it (slightly) worse, 32 is the best value. Overall prefetching gives a ~60ms advantage on my machine (I doubled the number of items 10x, total time ~750ms with prefetching).

Chayim Friedman
– Chayim Friedman

2024-07-30 17:39:09 +00:00
Commented Jul 30, 2024 at 17:39
1

So it does seem to be related to the CPU, I'm running laptop i9-13900HX, a bit more powerful than yours ;P

Chayim Friedman
– Chayim Friedman

2024-07-31 15:29:08 +00:00
Commented Jul 31, 2024 at 15:29

| Show 12 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

optimize indexing while inside of a loop

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest