Do all pointers have the same size in C++?

8.8k views Asked by At

Recently, I came across the following statement:

It's quite common for all pointers to have the same size, but it's technically possible for pointer types to have different sizes.

But then I came across this which states that:

While pointers are all the same size, as they just store a memory address, we have to know what kind of thing they are pointing TO.

Now, I am not sure which of the above statements is correct. The second quoted statement looks like it's from the C++ notes of Computer Science, Florida State University.


Here's why, in my opinion all pointers should have the same size:

1) Say we have:

int i = 0;
void* ptr = &i; 

Now, suppose the C++ standard allows pointers to have different sizes. Further suppose that on some arbitrary machine/compiler (since it is allowed by the standard), a void* has size 2 bytes while a int* has size 4 bytes.

Now, I think there is a problem here which is that the right hand side has an int* which has size 4 bytes while on the left hand side we have a void* which has size 2 bytes. Thus, when the implicit conversion happens from int* to void* there will be some loss of information.

2) All pointers hold addresses. Since for a given machine all addresses have the same size, it is very natural (logical) that all pointers should also have the same size.

Therefore, I think that the second quote is true.


My first question is what does the C++ standard say about this?

My second question is, if the C++ standard does allow pointers to be of different size, then is there a reason for it? I mean allowing pointers to be of different size seems a bit unnatural to me (considering the 2 points I explained above). So, I am pretty sure that the standard committee must have already given this (that pointers can have different sizes) thought and already have a reason for allowing pointers to have different sizes. Note that I am asking this (2nd question) only if the standard does allow pointers to have different size.

10

There are 10 answers

4
Aganju On

Practically, you’ll find that all pointers within one system are same size, for nearly all modern systems; with ‘modern’ starting at 2000.
The permission to be different size comes from older systems using chips like 8086, 80386, etc, where there were ‘near’ and ‘far’ pointers, of obviously different sizes. It was the compiler’s (and sometimes the developer’s) job to sort out - and remember! - what goes in a near pointer and what goes in a far pointer.

C++ needs to stay compatible with those times and environments.

18
Brian Bi On

While it might be tempting to conclude that all pointers are the same size because "pointers are just addresses, and addresses are just numbers of the same size", it is not guaranteed by the standard and thus cannot be relied upon.

The C++ standard explicitly guarantees that:

  • void* has the same size as char* ([basic.compound]/5)
  • T const*, T volatile*, and T const volatile* have the same size as T*. This is because cv-qualified versions of the same type are layout-compatible, and pointers to layout-compatible types have the same value representation ([basic.compound]/3).
  • Similarly, any two enum types with the same underlying type are layout-compatible ([dcl.enum]/9), therefore pointers to such enum types have the same size.

It is not guaranteed by the standard, but it is basically always true in practice, that pointers to all class types have the same size. The reason for this is as follows: a pointer to an incomplete class type is a complete type, meaning that you are entitled to ask the compiler sizeof(T*) even when T is an incomplete class type, and if you then ask the compiler sizeof(T*) again later in the translation unit after T has been defined, the result must be the same. Furthermore, the result must also be the same in every other translation unit where T is declared, even if it is never completed in another translation unit. Therefore, the compiler must be able to determine the size of T* without knowing what's inside T. Technically, compilers are still allowed to play some tricks, such as saying that if the class name starts with a particular prefix, then the compiler will assume that you want instances of that class to be subject to garbage collection, and make pointers to it longer than other pointers. In practice, compilers do not seem to use this freedom, and you can assume that pointers to different class types have the same size. If you rely on this assumption, you can put a static_assert in your program and say that it doesn't support the pathological platforms where the assumption is violated.

Also, in practice, it will generally be the case that

  • any two function pointer types have the same size,
  • any two pointer to data member types will have the same size, and
  • any two pointer to function member types will have the same size.

The reason for this is that you can always reinterpret_cast from one function pointer type to another and then back to the original type without losing information, and so on for the other two categories listed above (expr.reinterpret.cast). While a compiler is allowed to make them different sizes by giving them different amounts of padding, there is no practical reason to do this.

(However, MSVC has a mode where pointers to members do not necessarily have the same size. It is not due to different amounts of padding, but simply violates the standard. So if you rely on this in your code, you should probably put a static_assert.)

If you have a segmented architecture with near and far pointers, you should not expect them to have the same size. This is an exception to the rules above about certain pairs of pointer types generally having the same size.

5
bta On

suppose the standard C++ allows pointers to have different sizes

The size, structure, and format of a pointer is determined by the architecture of the underlying CPU. Language standards don't have the ability to make many demands about these things because it's not something the compiler implementer can control. Instead, language specs focus on how pointers will behave when used in code. The C99 Rationale document (different language, but the reasoning is still valid) makes the following comments in section 6.3.2.3:

C has now been implemented on a wide range of architectures. While some of these architectures feature uniform pointers which are the size of some integer type, maximally portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type.

...

Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.

An easy example of this is a pure Harvard architecture computer. Executable instructions and data are stored in separate memory areas, each with separate signal pathways. A Harvard architecture system can use 32-bit pointers for data but only 16-bit pointers to a much smaller instruction memory pool.

The compiler implementer has to ensure that they generate code that both functions correctly on the target platform and behaves according to the rules in the language spec. Sometimes that means that all pointers are the same size, but not always.

The second reason for having all the pointer to be of the same size is that all pointer hold address. And since for a given machine all addresses have the same size

Neither of those statements are necessarily true. They're true on most common architectures in use today, but they don't have to be.

As an example, so-called "segmented" memory architectures can have multiple ways to format an assembly operation. References within the current memory "segment" can use a short "offset" value, whereas references to memory outside the current segment require two values: a segment ID plus an offset. In DOS on x86 these were called "near" and "far" pointers, respectively, and were 16 and 32 bits wide.

I've also seen some specialized chips (like DSPs) that used two bytes of memory to store a 12-bit pointer. The remaining four bits were flags that controlled the way memory was accessed (cached vs. uncached, etc.) The pointer contained the memory address, but it was more than just that.

What a language spec does with all of this is to define a set of rules defining how you can and cannot use pointers in your code, as well as what behavior should be observable for each pointer-related operation. As long as you stick to those rules, your program should behave according to the spec's description. It's the compiler writer's job to figure out how to bridge the gap between the two and generate the correct code without you having to know anything about the CPU architecture's quirks. Going outside the spec and invoking unspecified behavior will make those implementation details become relevant and you're no longer guaranteed as to what will happen. I recommend enabling the compiler warning for conversions that result in a loss of data, and then treating that warning as a hard error.

1
gnasher729 On

I’ve seen actual code for a DSP that addressed 16 bit units. So if you took a pointer to int, interpreted the bits as an integer, and increased that by one, the pointer would point to the next 16 bit int.

On this system, char was also 16 bits. If char had been 8 bits, then a char* would have been an int pointer with at least one additional bit.

1
Juan On

In modern C++, there are smart pointers in the standard library, std::unique_ptr, and std::shared_ptr. The unique pointer can be the same size of regular pointers when they do not have a deleter function stored with them. A shared pointer may be larger, since it could still store the pointer, but also a pointer to a control block maintaining the reference counts and deleter for the object. This control block could potentially be stored with the allocated object (using std::make_shared), so it may make the reference counted object slightly bigger.

See this interesting question: Why is the size of make_shared two pointers?

0
kackle123 On

As an embedded programmer, I wonder whether even these C languages have taken us too far from the machine! :)

The father, "C", was used to design systems (low-level). Part of the reason different pointer variables need not be the same size is that they can refer to physically different system memories. That is, different data at different memory addresses can actually be located on separate electronic integrated circuits (IC)! For example, constant data might be located on one non-volatile IC, volatile variables on another IC, etc. A memory IC might be designed to be accessed 1 byte at a time, or 4 bytes at a time, etc. (what "pointer++" does).

What if the particular memory bus/address space is only a byte wide? (I've worked with those before.) Then pointer==0xFFFFFFFFFFFFFFFF would be wasteful and perhaps unsafe.

0
Davislor On

In addition to the requirements of the C++ standard, any implementation that supports the UNIX dlsym() library call must be able to convert a function pointer to a void*. All function pointers must also be the same size.

There have been architectures in the real world where different kinds of pointers have different sizes. One formerly very mainstream example was MS-DOS, where the Compact and Medium memory models could make code pointers larger than data pointers or vice versa. In segmented memory, it was also possible to have object pointers that were different sizes (such as near and far pointers). Finally, some old mainframes had complex pointers that could be different sizes for different types of objects, and fat pointers are even making a comeback on ARM64.

25
Aconcagua On

Member function pointers can differ:

void* ptr;

size_t (std::string::*mptr)();

std::cout << sizeof(ptr) << '\n';
std::cout << sizeof(mptr) << std::endl;

This printed

8
16

on my system. Background is that member function pointers need to hold additional information e.g. about virtuality etc.

Historically there were systems on which existed 'near' and 'far' pointers which differed in size as well (16 vs. 32 bit) – as far as I am aware of they don't play any role nowadays any more, though.

8
MSalters On

Your reasoning in the first case is half-correct. void* must be able to hold any int* value. But the reverse is not true. Hence, it's quite possible for void* to be bigger than int*.

The statement als gets more complex if you include other pointer types, such as pointers to functions and pointers to methods.

One of the reasons considered by the C++ Standards committee are DSP chips, where the hardware word size is 16 bits, but char is implemented as a half-word. This means char* and void* need one extra bit compared to short* and int*.

26
Bathsheba On

A few rules:

  1. The sizes of plain-old-data pointers can differ, e.g. double* can be (and often is) larger than int*. (Think of architectures with off-board floating point units.)

  2. void* must be sufficiently large to hold any object pointer type.

  3. The size of any non-plain-old-data pointer is the same as any other. In other words sizeof(myclass*) == sizeof(yourclass*).

  4. sizeof(const T*) is the same as sizeof(T*) for any T; plain-old-data or otherwise

  5. Member function pointers are not pointers. Pointers to non-member functions, including static member functions, are pointers.