Swift SIMD operands slower than a simple while loop, array times a scalar multiplication

Ask Question

Asked 1 year, 11 months ago

Modified 1 year, 11 months ago

Viewed 151 times

Part of Mobile Development Collective

I have a buffer of bytes, I want to multiply each byte be another byte like 0x20. One way is to simply iterate over the buffer and multiply each byte. This is obviously suboptimal, SIMD can do this much faster. But using SIMD in Swift is much slower.

On a MacBook Pro M1 Max:
SIMD: 180ms for 100k iterations (operating on 64 bytes at a time)
Loop: 35ms for 6.4M iterations (operating at a single byte)

Here is the code:

let inBytes = Data(repeating: 0x20, count: 6400000).withUnsafeBytes { bufferPointer in
    // 100K iterations of the outer loop
    // Empty while loop takes about 2ms
    while(iteration < 6_400_000 / SIMD64<UInt8>.scalarCount) {
        let assumed = bufferPointer.assumingMemoryBound(to: SIMD64<UInt8>.self)
        let batch = assumed[0] // Will use the same batch all the time for testing purposes

        // This takes 180ms for 100k iterations (6_400_000 bytes / 64 bytes size of the simd)
        let spaceMask = batch &* 0x20
        /*
         Looking to do all these operations much faster, they are all slow
           let spaceMask = batch .== 0x20
           let result = batch &* 0x20
           let tabMask = batch .== 0x09
           let combinedMask = (spaceMask .| tabMask)._storage
       */
        
        // Using this loop, it takes 35ms total, running 6.4 million iterations in total
        var i = 0
        while(i < 64) {
            let batchNumber = batch[i] &* 0x20
            i += 1
        }

        iteration += 1

    }
}

I would expect the SIMD version to be at least 10x faster than a while loop, instead I got 5 times slower.

asked Nov 30, 2023 at 21:40

Vladislav

1,5021 gold badge15 silver badges23 bronze badges

These measurements are in release mode, right?

Alexander
– Alexander

2023-12-01 03:59:06 +00:00
Commented Dec 1, 2023 at 3:59
@Alexander yes, tried multiple optimisations, same results.

Vladislav
– Vladislav

2023-12-01 11:44:06 +00:00
Commented Dec 1, 2023 at 11:44
So interestingly, the SIMD types in Swift don't actually have special semantics to force them to lower into SIMD operations. Instead, they're implemented with loops in a particular way that causes LLVM to recognize them and auto-vectorize them (in a way that's suitable for the target platform). Something here might be preventing that optimization. In fact, it's quite likely that your while loop is being auto-vectorized. Could you compare the output assembly in both cases? E.g. on godbolt.org

Alexander
– Alexander

2023-12-01 14:54:03 +00:00
Commented Dec 1, 2023 at 14:54

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Swift SIMD operands slower than a simple while loop, array times a scalar multiplication

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked