Why can llvm not auto-vectorize comparing two arrays and write result to vector<bool>?

Ask Question

Asked 2 years, 1 month ago

Modified 1 year, 7 months ago

Viewed 204 times

I want to compare two arrays and write the result to bitmap. Therefore, I tried the following code:

void wtf(int *lhs, int *rhs, int len, std::vector<bool>& dst){
    for (int i = 0; i < len; i++){
        dst[i] = lhs[i] == rhs[i];
    }
}

In the compiler explorer, we can see that the above code is not vectorized. If we write the result to a boolean array, LLVM will vectorize it.

Why can the above code not be vectorized? What is the most efficient code to accomplish this task? I searched a lot and did not find the answer.

edited Apr 19, 2024 at 0:09

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

asked Oct 24, 2023 at 9:21

YjyJeff

9531 gold badge7 silver badges15 bronze badges

3

std::vector<bool> is a weird thing, and many people consider it to be a mistake. It's allowed to store bools as single bit (instead of using full byte), which makes is very clunky. No wonder compiler misses an optimisation here.

Yksisarvinen
– Yksisarvinen

2023-10-24 09:30:48 +00:00
Commented Oct 24, 2023 at 9:30
1

I'm guessing vector<unsigned char> would vectorize fine, too.. See also en.cppreference.com/w/cpp/container/vector_bool

ildjarn
– ildjarn

2023-10-24 09:31:57 +00:00
Commented Oct 24, 2023 at 9:31
You know vector<bool> is a bit-array, right? The C++ functions involved are a lot of code for a compiler to "see through". The actual asm needed on x86-64 is pcmpeqd / movmskps to get 4 bits at a time, and some scalar shift / OR to combine into 8-bit chunks. (AVX2 with 32-byte vectors could get an 8-bit mask from 8 ints at once.) Or compare 4 vectors and combine with 2x packssdw / 1x packsswb before pmovmskb to get one 16-bit mask result. But IDK what chunk size the vector<bool> specialization uses internally. (no luck wth x86-64-v4 or GCC: godbolt.org/z/b4PKnWbzT)

Peter Cordes
– Peter Cordes

2023-10-24 09:32:42 +00:00
Commented Oct 24, 2023 at 9:32
4

@Yksisarvinen: Howard Hinnant describes the situation well: isocpp.org/blog/2012/11/on-vectorbool - a bit-array is a useful data structure that it's good to have in the C++ standard library. vector<bool> is a terrible choice of name for it, though; that's the mistake. x86-64 SIMD (SSE2 and AVX2) can very naturally and efficiently generate bitmaps from SIMD compares, so in theory it's an efficient data structure. But in practice the C++ wrappers (both libstdc++ and libc++) are too complex for auto-vectorization to see through, it seems.

Peter Cordes
– Peter Cordes

2023-10-24 09:46:17 +00:00
Commented Oct 24, 2023 at 9:46
1

It's maybe not that common; std::vector<bool> is fairly unpopular. You might have more luck auto-vectorizing a loop that works in chunks of 8, 16, or 32 elements, building up a mask in a uint32_t and doing one assignment through a pointer, instead of many separate RMWs with complicated indexing to break down a bit-index into pointer and shift count. IIRC, there have been some previous SO Q&As where people had some success at getting something to auto-vectorize to a pmovmskb or movmskps.

Peter Cordes
– Peter Cordes

2023-10-24 13:01:53 +00:00
Commented Oct 24, 2023 at 13:01

| Show 3 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Why can llvm not auto-vectorize comparing two arrays and write result to vector<bool>?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest