I have two functions counting the occurrences of a target char in the given input buffer. The functions vary only in how they communicate the result back to the caller; one returns the result and the other writes to a variable passed by reference.
#include <cstdlib>
#define BUF_LEN 0x1000
size_t check_count1(const char* buf, char target) {
size_t count = 0;
for (size_t i = 0; i < BUF_LEN; i++) {
if (buf[i] == target) {
count++;
}
}
return count;
}
void check_count2(const char* buf, char target, size_t& count) {
for (size_t i = 0; i < BUF_LEN; i++) {
if (buf[i] == target) {
count++;
}
}
}
I am puzzled by how Clang and GCC generate code for these two functions. The loop in check_count1 is vectorized, but for check_count2 it's not. Initially I thought this was due to pointer aliasing in the second case, but specifying __restrict has no effect. Here's the link to compiler explorer.
An older ICC compiler did just fine with both loops. What changed?
bufcan aliascount:-(__restricton both pointers: godbolt.org/z/oc94bxq71count.ifis dropped to have branchless code then it is also vectorized.