0

I've recently been reading the C standard ISO/IEC 9899:2018 specification. Wherein, Section 6.5.6 (Additive operators) describes constraints on the + operator. Rule [8] says:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Apart from the behaviour of arithmetic on arrays, I've also understood this rule as also describing that since null pointers don't point to any objects arithmetic on them is undefined behaviour. (A Reddit post I've found in relation to this supports my conclusion)

Firstly: Is my comprehension of the above statement correct?

Secondly: What about the code below? Will that also be considered UB, if so/if not, why? Do the rules described in Section 6.5.6 also apply to intptr_t? The above code also doesn't seem to violate the rules in Section 6.3.2.3 (Pointers).

int arr[2] = {0};
intptr_t ptr_0 = 0;
intptr_t ptr = (intptr_t) arr;

intptr_t new = ptr_0 + ptr;
int* ptr_int =  (int*) new; 

Some other sections that provide more context:

  • Section 6.3.2.3 (Pointers) - Rules [1-6]
  • Section 6.3.2.2 (Void)
  • Section 6.2.5 (Types) - Rules [1,19,20,28]

Thanks for your help in advance!

4
  • 2
    How does it make any sense to add pointers? Commented Apr 18, 2024 at 22:54
  • 3
    intptr_t is not a pointer, it's an integer. The rules for pointer addition are irrelevant. Commented Apr 18, 2024 at 22:55
  • intptr_t is a signed integer. So adding two values that cause overflow is undefined. Commented Apr 18, 2024 at 23:04
  • new is identical to ptr, so the code snippet is well-defined, but ptr_0 does not necessarily correspond to a null pointer. Commented Apr 19, 2024 at 6:17

2 Answers 2

1

intptr_t is just an ordinary signed integer type. You can use it just like any other integer, the rules about pointers do not apply to it. It's similar to the types int32_t and int64_t -- they exist so programmers don't have to guess whether to use int, long, long long, etc. to get a type large enough for their needs (for historical reasons, the basic types were defined to be the natural types of the CPU/memory architecture, rather than specific sizes that the programmer could count on).

What's special about it (and also uintptr_t) is that it's guaranteed to be large enough to hold any void* pointer (and since any object pointer can be converted to void* without any loss of information, this also means it's large enough to hold any object pointer). So you can cast a pointer to intptr_t, and cast the result back, and get the original pointer back.

However, there's no such guarantee if you modify the value of the intptr_t variable. The following is undefined:

int arr[2] = {0, 1};
intptr_t iptr = (intptr_t) arr;
iptr += sizeof(int);
int *ptr = (int *)iptr;
printf("%d\n", *ptr);

In most implementations I'd expect this to print 1, but it's not required by the specification.

Sign up to request clarification or add additional context in comments.

5 Comments

Why would that code be UB? Here: int *ptr = (int *)iptr; ptr is a pointer to an unknown area. The compiler cannot make assumptions about what's stored there, it need not and probably cannot keep track of the "effective type" pointed at by ptr. Any pointer casts to/from addresses unknown by the compiler are in the realm of implementation-defined/unspecified behavior rather than UB. However if we modify the contents *ptr = 2; and then print arr[1], then it may still print 1 since the compiler doesn't need to know that ptr is an alias for &arr[1].
I think one problem is that there is no assumption about the representation of pointers in the Standard so there is no knowing what adding in the integer domain does in the pointer domain. For example, a system could have a number of memory banks and for some reason choose to represent pointers in 32 bits with the upper 24 bits being an address into a bank and the lower 8 bits being a bank number. Adding a small integer would then not behave anything close to the expectation. The example may be far-fetched, but the point is that it does not contradict the Standard.
@Lundin The compiler doesn't need to "keep track" of anything. It's the programmer's responsibilty to avoid doing things that have no specified result.
So how exactly is this UB as per what part of the standard?
It's UB because nothing in the standard defines what it means to cast an arbitrary integer to a pointer. Only pointer->integer->pointer is defined, where integer is unmodified.
0

The Standard generally regards conversions between integers and pointers as yielding Implementation-Defined Behavior. Most implementations define the behavior in such a manner that some or all of the following will apply, but the Standard makes no distinction between those that offer these guarantees and those which don't:

  1. A conversion between a pointer and an integer will yield the integer with the same bit pattern as the pointer and vice versa.

  2. Pointer arithmetic will behave in linear octet-based fashion, such that (uintptr_t)((char*)p+n) will equal ((uintptr_t)(char*)p))+n, in all cases where the pointer arithmetic in the former would have defined behavior.

  3. If conversion of an un-restricted pointer p of some type T* to an integer would yield a particular number, conversion of an integer having that value to a T* will yield a pointer that may be used interchangeably with p, regardless of how the integer was computed.

  4. Computations and comparisons involving integers formed from pointers will behave the same way as they would be processed with integers computed via any other means, at least in cases where the results of such computations are not converted to pointers.

If an implementation documents that its integer/pointer conversion semantics satisfy any or all of the above guarantees, code that exploits such guarantees would have defined behavior on that implementation, but may not have defined behavior on implementations that don't offer such guarantees. When C89 was written, implementations that upheld #1 and #2 would invariably uphold #3 and #4; there was no perceived reasons for implementations to document that they upheld such guarantees, because nobody would ever have imagined them doing otherwise. Neither clang nor gcc, however, consistently upholds #3 and #4, when optimizations are enabled.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.