3

I have two functions counting the occurrences of a target char in the given input buffer. The functions vary only in how they communicate the result back to the caller; one returns the result and the other writes to a variable passed by reference.

#include <cstdlib>

#define BUF_LEN 0x1000

size_t check_count1(const char* buf, char target) {
    size_t count = 0;
    for (size_t i = 0; i < BUF_LEN; i++) {
        if (buf[i] == target) {
            count++;
        }
    }
    return count;
}

void check_count2(const char* buf, char target, size_t& count) {
    for (size_t i = 0; i < BUF_LEN; i++) {
        if (buf[i] == target) {
            count++;
        }
    }
}

I am puzzled by how Clang and GCC generate code for these two functions. The loop in check_count1 is vectorized, but for check_count2 it's not. Initially I thought this was due to pointer aliasing in the second case, but specifying __restrict has no effect. Here's the link to compiler explorer.

An older ICC compiler did just fine with both loops. What changed?

9
  • 3
    buf can alias count :-( Commented Nov 12 at 16:35
  • pointer aliasing would be my guess. Just one more reason to not use output parameters Commented Nov 12 at 16:36
  • 1
    @Jarod42 Here's an example with __restrict on both pointers: godbolt.org/z/oc94bxq71 Commented Nov 12 at 16:38
  • 1
    BTW, they differ also with initial value of count. Commented Nov 12 at 16:39
  • 1
    When if is dropped to have branchless code then it is also vectorized. Commented Nov 12 at 16:50

2 Answers 2

4

One reason is pointer aliasing, as NathanOliver points out in comment.

Another reason is that, in its current form, if buf[i] == target is false for every i, the modification of count needs to be skipped. This doesn't matter if count is a local variable, in which case an extra assignment to count is unobservable, but matters if count lives in rodata†, where modification is not allowed.

If the loop body is changed to always modify count, then GCC and Clang will vectorize. For example, this will also be vectorized:

void check_count2(const char* buf, char target, size_t& __restrict count) {
    for (size_t i = 0; i < BUF_LEN; i++) {
        if (buf[i] == target) {
            count++;
        } else {
            count++;
            count--;
        }
    }
}

As discussed in Crash with icc: can the compiler invent writes where none existed in the abstract machine?, ICC invents modifications even if it's not allowed to do so.

†: This can happen if count is initialized from const_cast<size_t&>(n), where n is a const global variable. Note that casting away constness is not undefined behavior: modifying const variable is.

Sign up to request clarification or add additional context in comments.

Comments

0

In the first case there is more flexibility for compiler to deal with the problem.

"count" is declared in the function and can be returned when the job is done. This means the compiler can keep it wherever it wishes, and it will likely keep it in a CPU register.

In the second case you pass a reference to the "count". That means a few of problems which make vectorization hard:

  • "count" is kept somewhere in RAM and the function got its address. Vectorized instructions would struggle with this anyway.

  • The compiler does not know your intentions so it has to play it safe. Do you intend the rest of the code to look at "count" after the job is done or do you want the changes to it to be potentially observable from the outside? The standard does not cover it clearly, so you are in the territory of undefined behaviour anyway. The old ICC compiler was probably making assumptions different to those that Clang and GCC make.

  • What if the array of "buf" and the location of "count" overlap? Technically, it is not impossible.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.