Does idiomatic initialization of a dynamic array invoke Undefined Behavior?

Question

This question may be a bit controversial. I have a following code at block scope:

int *a = malloc(3 * sizeof(int));
if (!a) { ... error handling ... }
a[0] = 0;
a[1] = 1;
a[2] = 2;

I argue that this code invokes UB due to pointer arithmetics outside of bounds. The reason is that the effective type of the object pointer by a is never set to int[3] but rather int only. Therefore any access to the object at an index other than 0 is not defined by C standard.

Here is why:

Line a = malloc(...). If the allocation succeeds thena points for a region large enough to store 3 ints.

a[0] = ... is equivalent to *a = ..., an l-value of int. It sets the effective type of the first sizeof(int) bytes to int as indicated in the rule 6.5p6.

... For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

Now the pointer a points to an object of type int, not int[3].

a[1] = ... is equivalent to *(a + 1) =. Expression a + 1 points to an element one after the end of int object accessible through *a. This pointer itself is valid for comparison but accessing is undefined due to:

Rule 6.5.6p7:

... a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

And rule 6.5.6p8:

... If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

The similar issue is relevant for a[2] = ... but here even a + 2 hidden in a[2] invokes UB.

The issue could be resolved if the standard allowed arbitrary pointer arithmetic with the valid region of memory as long as alignment requirements and strict aliasing rule is satisfied. Or that any collection of the consecutive objects of the same type can be treated as an array. However, I was not able to find such a thing.

If my interpretation of the standard is correct then some C code (all of it?) would be undefined. Therefore it is one of those rare cases when I hope that I am wrong.

Am I?

You're correct that a doesn't point to an object of type int[3]. One reason is that a pointer to int[3] would have the type int (*)[3] which is very different from the type of a. Instead it says that a + i (for any valid index i, including 0) is pointing to an int. — Some programmer dude
– Some programmer dude, Commented Dec 1, 2021 at 13:31
7.22.3 Memory management functions ".... and then used to access such an object or an array of such objects in the space allocated ..." is probably relevant. That usage of malloc is all over the place in C, you're overthinking this. — Mat
– Mat, Commented Dec 1, 2021 at 13:33
The effective type and strict aliasing rules are plain broken and this is one such example. However, the rule about pointer arithmetic only being allowed within an array is equally broken, whenever applied to a chunk of data of unknown (effective) type. You get the same problems whenever doing pointer arithmetic on for example a map of hardware registers in a microcontroller. The C standard doesn't generally acknowledge that there can be things placed in the address space which were not placed there by a C compiler. — Lundin
– Lundin, Commented Dec 1, 2021 at 13:34
@Mat, yes, I'm overthinking, but language-lawyer tag is exactly for overthinking things. The wording from 7.22.3 looks relevant but it is contradicting with other more explicit rules. — tstanisl
– tstanisl, Commented Dec 1, 2021 at 13:39
@Mat Rather, whoever came up with the rules of effective type were "underthinking" this. They don't address arrays/aggregate types nor do they address type qualifiers. The whole of 6.5 §6-§7 can be replaced with "here the implementation can puzzle things together between the lines as it pleases, in an undocumented manner". All of this boils down to quality of implementation in the end. — Lundin
– Lundin, Commented Dec 1, 2021 at 13:40

supercat · Accepted Answer · 2021-12-06 20:53:48Z

2

The Standard only "halfway" defines the term "object": it says that every object is a region of storage, but it does not specify when a region of storage is or is not an object. For most of the Standard, it would be fine to say that every region of storage simultaneously contains all objects of all types that will fit therein; any action which modifies an object modifies the underlying storage, and any action which modifies the underlying storage modifies the stored value of all objects therein.

I think it's fairly clear that the authors of the Standard expected that in cases where the Standard says an action invokes Undefined Behavior, but the behavior would be defined in the absence of that statement, quality implementations should behave in the defined fashion in cases where their customers would find that useful. The question of which cases those are, however, is a Quality of Implementation issue outside the Standard's jurisdiction. As such, it didn't really matter if the Standard characterized as Undefined Behavior some action which all implementations to date had processed in the same obviously-useful fashion, because nobody seeking to sell compilers would interpret the Standard's failure to mandate such a behavior as an invitation to deviate from it in ways that would be detrimental to their customers.

Because different compilers are used for different purposes, the only way the Standard could actually define all the behaviors which would be needed for many low-level programming tasks while also allowing all of the optimizations that would be useful for high-end number crunching would be to either recognize categories of implementations that make different optimizations, or add better means of inviting or blocking optimizations that would usefully improve performance and/or result in incorrect program behavior. Because every compiler that has ever existed or will plausibly ever exist will refrain from making some optimizations that would otherwise have been useful, and/or perform "optimizations" which incorrectly process some Strictly Conforming C11 programs, the question of whether the Standard would allow a silly optimization should only be relevant to people who either want to write poor quality compilers, or who want to bend over backward to be compatible with them.

edited Dec 6, 2021 at 20:53

answered Dec 4, 2021 at 21:29

supercat

82k9 gold badges179 silver badges226 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

chqrlie Over a year ago

because nobody seeking to sell compilers would interpret the Standard's failure to mandate such a behavior as an invitation to deviate from it... optimizing compilers are not far from that when they take advantage of potential undefined behavior to generate counter intuitive optimisations and break existing code that was not fully defined but ran fine with previous state of the art.

supercat Over a year ago

@chqrlie: Perhaps I should have re-amplified the part of the text I'd italicized above: ...to deviate from it in any way that would make the compiler less useful for their customers. For most purposes a compiler that can meaningfully process a wide range of non-portable programs would be more useful than one that could not. Given float *floatPtr, there is no reason why a quality compiler should, absent some unusual configuration options, assume that an access to *(unsigned*floatPtr wouldn't access an object of type float. Actually, if one recognizes the principle that...

supercat Over a year ago

...an access made via lvalue whose address is freshly visibly derived from one of a particular type should be recognized as being an access of that type in cases where the latter would be defined, but left the meaning of "freshly visibly derived" as a Quality of Implementation issue, that would be much more workable for programmers and most compiler writers alike, at least for people who aren't having to maintain compilers whose front-ends strip out information necessary to support such constructs.

tstanisl Over a year ago

So is the answer to question that it's the example of a "technical UB"? A kind of UB that all relevant/useful implementations of C define in the same way. It looks like some kind of defect in the standard.

supercat Over a year ago

@tstanisl: The Standard was never intended to describe all of the situations in which implementations claiming suitability for any particular purpose should be expected to behave usefully. The fact that it doesn't do so isn't really a defect. The primary failing is its failure to make clear that it waives jurisdiction over many correct but non-portable programs, and that waiver of jurisdiction over a program's behavior does not imply any judgment that the program should be viewed as "erroneous" or "broken".

|

Collectives™ on Stack Overflow

Does idiomatic initialization of a dynamic array invoke Undefined Behavior?

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related