0

I want to compare two arrays and write the result to bitmap. Therefore, I tried the following code:

void wtf(int *lhs, int *rhs, int len, std::vector<bool>& dst){
    for (int i = 0; i < len; i++){
        dst[i] = lhs[i] == rhs[i];
    }
}

In the compiler explorer, we can see that the above code is not vectorized. If we write the result to a boolean array, LLVM will vectorize it.

Why can the above code not be vectorized? What is the most efficient code to accomplish this task? I searched a lot and did not find the answer.

8
  • 3
    std::vector<bool> is a weird thing, and many people consider it to be a mistake. It's allowed to store bools as single bit (instead of using full byte), which makes is very clunky. No wonder compiler misses an optimisation here. Commented Oct 24, 2023 at 9:30
  • 1
    I'm guessing vector<unsigned char> would vectorize fine, too.. See also en.cppreference.com/w/cpp/container/vector_bool Commented Oct 24, 2023 at 9:31
  • You know vector<bool> is a bit-array, right? The C++ functions involved are a lot of code for a compiler to "see through". The actual asm needed on x86-64 is pcmpeqd / movmskps to get 4 bits at a time, and some scalar shift / OR to combine into 8-bit chunks. (AVX2 with 32-byte vectors could get an 8-bit mask from 8 ints at once.) Or compare 4 vectors and combine with 2x packssdw / 1x packsswb before pmovmskb to get one 16-bit mask result. But IDK what chunk size the vector<bool> specialization uses internally. (no luck wth x86-64-v4 or GCC: godbolt.org/z/b4PKnWbzT) Commented Oct 24, 2023 at 9:32
  • 4
    @Yksisarvinen: Howard Hinnant describes the situation well: isocpp.org/blog/2012/11/on-vectorbool - a bit-array is a useful data structure that it's good to have in the C++ standard library. vector<bool> is a terrible choice of name for it, though; that's the mistake. x86-64 SIMD (SSE2 and AVX2) can very naturally and efficiently generate bitmaps from SIMD compares, so in theory it's an efficient data structure. But in practice the C++ wrappers (both libstdc++ and libc++) are too complex for auto-vectorization to see through, it seems. Commented Oct 24, 2023 at 9:46
  • 1
    It's maybe not that common; std::vector<bool> is fairly unpopular. You might have more luck auto-vectorizing a loop that works in chunks of 8, 16, or 32 elements, building up a mask in a uint32_t and doing one assignment through a pointer, instead of many separate RMWs with complicated indexing to break down a bit-index into pointer and shift count. IIRC, there have been some previous SO Q&As where people had some success at getting something to auto-vectorize to a pmovmskb or movmskps. Commented Oct 24, 2023 at 13:01

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.