5

This is a follow up to Restricted access for allocated arrays belonging to separate object after discovering that returning by value across compilation units doesn't help to hint the compiler about disjoint memory allocations.

I have two functions, make_v1 and make_v2, that return an array-like object, by value. (This could be std::vector for example, the important point is that the copy constructor clearly allocates memory).

#include<vector>

template<class It> void modify_v(It);

void dreaded_function();

template<class It1, class It2>
void foo(It1 a, It2 b) {
    *a = 5;
    *b = 6;
    if(*a == *b) dreaded_function();  // this is never called if a and b do not overlap
}

int main() {
    std::vector<int> v1 = make_v1();
    std::vector<int> v2 = make_v2();

    foo(v1.begin(), v2.begin());
}

As you can see in the compiled code, there is no assumption made optimizing the call to dreaded_function. This is just an example of pointer aliasing, just to diagnose the situation. There are well-known cases of pointer aliasing creating worst performance problems.

https://godbolt.org/z/345PKrsnx

What high-level hints can I give the compiler, in C++ (or perhaps using extensions), that the iterator ranges are at memory regions that cannot overlap?

If this were C, or if I were using pointers and C++ extensions, I could use __restrict and a pointer interface, but I want something more high-level.

The only option that I found was to be very explicit about the copy:

...
std::vector<int> v1 = static_cast<std::vector<int> const&>(make_v1());
std::vector<int> v2 = static_cast<std::vector<int> const&>(make_v2());

This at least convinces clang that it can make the optimization (and at the cost of an extra copy I am pretty sure).

https://godbolt.org/z/nYfKeTKqj

But this is ugly and it is also making a copy, also if make_v becomes an included function later, the solution will have a cost. (Also it still doesn't convince GCC!)

Is this the only way I can assure the compiler that v1 and v2 memory do not overlap? Is there a more streamlined solution? Should [[assume]] work for this eventually?

At this point, the best I am hoping is some keyword/extension that "simulates" that the result of make_v1/make_v2 is copied into v1/v2.

15
  • 1
    Perhaps make local pointers to the iterator contents, and make them __restrict? Commented Oct 21, 2024 at 18:27
  • @HolyBlackCat, are you suggesting writing a non-generic foo like foo_restricted that takes restricted pointer and call it foo_restricted_contiguous(&*v1.begin(), &*v2.begin())? Commented Oct 21, 2024 at 18:30
  • 1
    Using __restrict__ actually works: godbolt.org/z/dr79r967z - Now I want restrict as a keyword in C++ too :-) Commented Oct 21, 2024 at 18:36
  • 1
    @alfC Yeah, it's perhaps possible to use the called function as a proxy that calls foo_detail(&*a, &*b); where that function uses __restrict__. Commented Oct 21, 2024 at 19:29
  • 1
    @TedLyngmo, I found a partial solution (below as an answer). Since I control the container (Multi-dimensional arrays) I can have a special member and iterator created with arr.RESTRICT_begin() or by make_restricted(arr.begin()). Not sure how it will scale in general. Commented Oct 24, 2024 at 9:02

2 Answers 2

3

I found a hacky way to do this. I realized that I can use the __restrict keyword in member (pointer) variables.

So, basically, if I "reimplement" vector iterator with a restrict pointer then I can get away with this, in principle.

In practice this only helps on GCC, and not in clang.

template<class Iterator>
struct restricted_vector_iterator {
    explicit restricted_vector_iterator(Iterator it) : p_{&*it} {}

    decltype(auto) operator*() const {return *p_;}

    decltype(auto) operator++(int) {++p_; return *this;}

 private:
    typename Iterator::pointer __restrict p_;
};


...
    foo(restricted_vector_iterator{v1.begin()}, restricted_vector_iterator{v2.begin()});

https://godbolt.org/z/oE1f1vnPr

This looks like casting the iterator to pointer but in generic code a restrict class could work differently for different iterator types. Also I am not modifying "library" code, only user code. (The user knows that the ranges do not overlap.)

Of course this is not ideal.

First, It doesn't scale well, because every critical iterator will have to be wrapped/reimplemented (__restrict cannot be applied to the iterator class, only to the contained pointer). This would be much simpler if __restrict worked recursively for the pointer member inside a class (specifically an iterator.)

Second, it sounds kind of dangerous that one can keep a restricted pointer alive in the wild, maybe I can make this wrapper iterator non-copyable or at least "hard to copy".

Third, it is not clear to me how pointer arithmetic will work (are pointers calculated from other restricted pointers, also restricted?, are restricted among them?)

Fourth, it is not portable, both because __restrict is an extension, and because at the end it only does the optimization on GCC it seems.

If you have any comments let me know.

Finally, if I had control over the container, I could have a different method like v1.RESTRICT_begin() for example.

Sign up to request clarification or add additional context in comments.

Comments

1

The general way to convince the compiler of an arbitrary boolean predicate is to __builtin_assume it, something like this:

#include <version>
#if defined(__clang__)
 #define ASSUME(x) __builtin_assume(x)
#elif defined(_MSC_VER)
 #define ASSUME(x) __assume(x)
#elif __has_builtin(__builtin_unreachable)
 #define ASSUME(x) do { if (!(x)) __builtin_unreachable(); } while (0)
#elif defined(__cpp_lib_unreachable)
 #include <utility>
 #define ASSUME(x) do { if (!(x)) std::unreachable(); } while (0)
#endif

(GCC supports __builtin_unreachable but not __builtin_assume. MSVC supports it but only as __assume, not __builtin_assume.)

Unfortunately, no existing compiler seems to understand the following sequence (Godbolt):

int misunderstood(int *p, int *q) {
  ASSUME(p != q);
  *p = 1;
  *q = 2;
  return *p;  // should return "1", surely
}

OTOH, both GCC 10+ and Clang 11+ understand this sequence just fine:

bool understood(int *p, int *q) {
  ASSUME(p != q);
  *p = 1;
  *q = 2;
  return (p == q);  // returns "false"
}

so you'd think there's no actual reason they couldn't go on to understand that, if two pointers point different places, then modifying one place can't possibly modify the other. And vice versa, it's ironic that neither GCC nor Clang understand that two independently __restrict'ed pointers can't be equal. It's almost as if both compilers have an internal idea of "aliasing" which is somehow orthogonal to "equaling."

2 Comments

Yeah, I have to learn exactly how different compilers interpret __restrict. It could be in a non-standard non-uniform way. Although in this particular case it should work, I think it makes sense that aliasing and inequality are different things. aliasing might only make sense in relation to the * operation. Also inequalities should persist after arithmetic, restrict p, restrict q means (loosely at least) that q != p, but it also means that q + n != p. At least is what in my mind, should make __restrict useful.
Right, the set of pairs (p,q) such that "p equals q" ought to be a strict subset of the set of pairs (p,q) such that "p aliases q." But the examples above indicate that GCC and Clang don't understand that: they both seem to think it's possible that "p doesn't alias q (enforced via __restrict) and yet p equals q anyway." I think that's a missed-optimization bug in both compilers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.