Can pointers to different types have different binary representations?

447 views Asked by At

I wonder if C++ implementations are allowed to represent pointers to different types differently. For instance, if we had 4-byte sized/aligned int and 8-byte sized/aligned long, would it be possible to represent pointers-to-int/long as object addresses shifted right by 2/3 bits, respectively? This would effectively forbid to convert a pointer-to-long into a pointer-to-int.

I am asking because of [expr.reinterpret.cast/7]:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).

[Note 7: Converting a pointer of type “pointer to T1” that points to an object of type T1 to the type “pointer to T2” (where T2 is an object type and the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. — end note]

The first sentence suggests that we can convert pointers to any two object types. However, the empathized text in the (not normative) Note 7 then says that the alignment plays some role here as well. (That's why I came up with that int-long example above.)

2

There are 2 answers

8
Yakk - Adam Nevraumont On BEST ANSWER

Yep

As a concrete example, there is a C++ implementation where pointers to single-byte elements are larger than pointers to multi-byte elements, because the hardware uses word (not byte) addressing. To emulate byte pointers, C++ uses a hardware pointer plus an extra byte offset.

void* stores that extra offset, but int* does not. Converting int* to char* works (as it must under the standard), but char* to int* loses that offset (which your note implicitly permits).

The Cray T90 supercomputer is an example of such hardware.

I will see if I can find the standards argument why this is valid thing for a compliant C++ compiler to do; I am only aware someone did it, not that it is legal to do it, but that note rather implies it is intended to be legal.

The rules are going to be in the to-from void pointer casting rules. The paragraph you quoted implicitly forwards the meaning of the conversion to there.

7.6.1.9 Static cast [expr.static.cast]

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

This demonstrates that converting to more-aligned types generates an unspecified pointer, but converting to equal-or-less aligned types that aren't actually there does not change the pointer value.

Which is permission to make a cast from a pointer to 4 byte aligned data converted to a pointer to 8 byte aligned data result in garbage.

Every object unrelated pointer cast needs to logically round-trip through a void* however.

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).

(From the OP)

That covers void* to T*; I have yet to find the T* to void* conversion text to make this a complete level answer.

0
Serge Ballesta On

The answer is yes. Simply because as the standard does not forbid it, an implementation could decide to have different representations for pointers to different types, or even different possible representations for a same pointer.

As most architecture now use flat addressing (meaning that the representation of the pointer is just the address), there are no good reason to do that. But I can still remember the old segment:offset address representation of the 8086 systems, that used to allow 16 bits systems to process 20 bits addresses (1024k). It used a 16 bit segment address (shifted by 4 bits to get a real address), and an offset of 16 bits for far pointers, or only 16 bits (relative to the current segment) for near addresses. In this mode, far pointers had a bunch of possible representations. BTW, far addressing was the default (so what was produced by normal source) in the large and compact mode (ref).