11

Herb Sutter refers to that in Peering Forward - C++’s Next Decade - Herb Sutter - CppCon 2024, and it stresses language several times (e.g. here), so I'd like to understand how to tell it apart from UB in general (not sure if I heard library undefined behavior in the same talk).


Is language UB something like this

int x = std::numeric_limits<int>::max() + 1;

because I'm invoking UB by overflowing an int which is not a library entity, as opposed to something like this

std::vector<int> v = getvec();
v[v.size()] = 3;

Where I'm doing out-of-bound access for a library entity like std::vector? (Well, at the end I'm still overflowing a an int[], though...)

6
  • 1
    It appears that Herb is referring to core language undefined behavior in a context that requires a constant expression. Neither of your examples is in a context that requires a constant expression. Commented Jun 17 at 14:52
  • 2
    Core constant expressions are expression that cannot by definition have any UB, and can thus be evaluated at compile time. ZThus if such an expression would lead to UB it must result in a compilation error. In other losely phrased words : "Normal" UB will only manifest at runtime. Commented Jun 17 at 15:20
  • 2
    @PepijnKramer Core constant expressions are expression that cannot by definition have any UB Only specified UB is required to be caught. Implicit UB is still possible Commented Jun 17 at 16:44
  • 2
    @LanguageLawyer That's a fine line... but yeah if your distinction is not there it would be the halting problem all over again ;) (In practice I am not such a detailed language laywer and just have a slightly more practical mental model, just to stay away from the real edge cases ;) ) Commented Jun 17 at 17:02
  • You might not be overflowing an int array if v.size() != v.capacity(). The UB is in the call to operator[] itself. Similar situation: std::string s = getstring(); s[s.size()] = 0; is well defined, but s[s.size()] = 1; is UB Commented Jul 16 at 3:45

2 Answers 2

12

The slide you refer to mentions:

C++ already has great gobs of production UB-free code (*)

(*) core language UB in a context that requires a constant expression

The standard defines the term "undefined behaviour" and adds more details in a note 3.65:

3.65 [defns.undefined]

undefined behavior

behavior for which this document imposes no requirements

[Note 1: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an incorrect construct or invalid data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message ([defns.diagnostic])), to terminating a translation or execution (with the issuance of a diagnostic message). Many incorrect program constructs do not engender undefined behavior; they are required to be diagnosed. Evaluation of a constant expression ([expr.const]) never exhibits behavior explicitly specified as undefined in [intro] through [cpp]. — end note]

Note the last sentence. This is what they refer to when saying "production UB-free code" though it holds only for constant expressions and only behavior explicitly specified as undefined.

For example, you cannot dereference a null pointer (https://eel.is/c++draft/expr.unary.op#1). Outside of constexpr it is undefined. During evaluation of a constant expression the compiler has to diagnose it:

#include <array>

constexpr int foo() {
    int * x = 0; 
    return *x;
}

int main() {
    constexpr auto x = foo();
}

Error message from gcc:

<source>:3:15: error: constexpr function never produces a constant expression [-Winvalid-constexpr]
    3 | constexpr int foo() {
      |               ^~~
<source>:5:12: note: read of dereferenced null pointer is not allowed in a constant expression
    5 |     return *x;
      |            ^
<source>:9:20: error: constexpr variable 'x' must be initialized by a constant expression
    9 |     constexpr auto x = foo();
      |                    ^   ~~~~~
<source>:5:12: note: read of dereferenced null pointer is not allowed in a constant expression
    5 |     return *x;
      |            ^
<source>:9:24: note: in call to 'foo()'
    9 |     constexpr auto x = foo();
      |                        ^~~~~

If you remove constexpr from the code it's just undefined and there is no compiler error message from gcc (it crashes with a segfault, but it could do anything else).

Opposed to the cases where the standard explicitly specifies something as undefined, there are cases that are implicitly undefined just by the mere fact that the standard does not define them. It's not possible for compilers to diagnose all of that even in constexpr contexts.

Sign up to request clarification or add additional context in comments.

6 Comments

Its not possible for compilers to diagnose all of that even in constexpr contexts: does this mean I can write a constant expression invoking implicit UB and the compiler will just be ok with it?
@Enlico To my understanding yes. Its hard to find an example of implicit ub. I am not a language lawyer, but I am sure there are things that the standard does neither define nor explicitly specify as being undefined.
I suppose that's why Herb makes a gesture meaning "just a little bit" after asking "How much language UB can you have in constexpr?" at 36:08?
Ok, yeah, that makes sense. But as regards my answer, are you saying that "core language UB" coincides with "behavior explicitly specified as undefined"?
@Enlico yes, "in [intro] through [cpp]", ie exluding the standard library as explained in the other answer. I admit I only understood this point after reading the other answer (and the other answer misses to mention that there might be UB not explicitly mentioned)
Its hard to find an example of implicit ub Maybe stackoverflow.com/a/79574300
8

Your understanding is mostly correct.

"Core language" means the language itself as opposed to its standard library.

UB in the core language should be always caught in constexpr contexts (per [expr.const]/10.8). As opposed to the UB in the library, which isn't caught in constexpr.

As noted in the other answer, the "implicit" UB also might not be caught, even in the core language (implicit = UB because the standard doesn't decribe the behavior, as opposed to explicitly banning it).


...as opposed to something like this

std::vector<int> v = getvec();
v[v.size()] = 3;

Where I'm doing out-of-bound access for a library entity like std::vector? (Well, at the end I'm still overflowing a an int[], though...)

(bold mine)

Nope. Formally UB can't happen inside of a standard library function.

The spec for operator[] says that if you pass an out-of-bounds index, you get UB. You get UB immediately when you call it with a bad index, regardless of what it does internally.

What it does internally is an implementation detail, it's not subject to the same rules as the user code. Compilers may or may not be more lenient to what happens in the system headers.

4 Comments

v[v.size()] = 3; eventually is an out-of-bounds access which is diagnosed godbolt.org/z/hz8fhG7v5. I guess strictly speaking a conforming implementation is not required to let v[v.size()] = 3; access an out of bounds element (it could just do anything) and what you say is correct. I think most "library UB" boils down to language UB (like v[v.size()] = 3; being undefined because out of bounds is undefined) but one cannot count on it.
Practically, buffer overflow in user code and in standard library may be handled by the compilers the same way. But only the former is called "UB". The standard library in not implemented in C++ proper (standard-conforming C++), how it's implemented is an implementation detail. It doesn't conform to the spec (it implements it), so talking about "UB" inside of it (which is a term defined in the spec) doesn't really make sense. Instead the UB happens on the standard library boundary, when you call a standard library function in a way that violates its preconditions.
"...when you call a standard library function in a way that violates its preconditions" A fancy way of saying the calling code has a bug.
The fact that the diagnostic looks the same way as a buffer overflow UB in user code would look is formally a coincidence. Those are two different kinds of UB (precondition violation vs array overflow).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.