Does casting a char * to another pointer type break the strict aliasing rule when the memory is from malloc?

201 views Asked by At

I read that char *- and their signed and unsigned counterparts - can alias any type without violating the strict aliasing rule. However, having a char * point to an int variable and casting that char * to a double * breaks the rules because the underlying object is of type int. But what if the memory is from malloc? For example:

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
    void *buffer = malloc(32);
    unsigned char *ptr = buffer;

    *ptr = 10;
    *((double *)(ptr + 1)) = 3.14;
    *((double *)(ptr + 9)) = 2.718;

    printf("*ptr: %d\n", *ptr);
    printf("*(ptr + 1): %lf\n", *((double *)(ptr + 1)));
    printf("*(ptr + 9): %lf\n", *((double *)(ptr + 9)));
    
    return 0;
}

This prints the following:

*ptr: 10
*(ptr + 1): 3.140000
*(ptr + 9): 2.718000

Correct me if I'm wrong but as far as I know the memory from malloc is untyped and can store any data unlike an int array which can only store data of type int.

I haven't received any warning from gcc but apparently it is not very reliable at warning you when you break the strict aliasing rules. So does my example break them?

3

There are 3 answers

14
Weijun Zhou On BEST ANSWER

Your code has undefined behavior due to invalid access by misaligned pointers.

The memory address of the pointer returned by malloc is specified to have the maximal alignment so that the address can be used by different types without issues. That's why it is valid to do

double* ptr = malloc(42*sizeof(double));

On the other hand, it is not guaranteed that ptr+1 is a properly aligned pointer for double. In fact, it is very likely that it is not aligned provided that we know ptr itself is properly aligned, as you can see the undefined behavior is reported here.

The next question is, if we change the code so that the pointers are properly aligned, what is the answer to your question.

int main(void)
{
    void *buffer = malloc(32);
    unsigned char *ptr = buffer;

    *ptr = 10;
    *((double *)(ptr + _Alignof(double))) = 3.14; //(*)

    printf("*ptr: %d\n", *ptr);
    printf("*(ptr + _Alignof(double)): %lf\n", *((double *)(ptr + _Alignof(double))));
    
    return 0;
}

For the above modified code, the behavior is well-defined assuming that there are no out-of-bound access to the allocated buffer. This can be seen from the following quote from the latest standard draft (6.5.6 in https://open-std.org/JTC1/SC22/WG14/www/docs/n3096.pdf):

The effective type of an object for an access to its stored value is the declared type of the object, if any98. If a value is stored into an object having no declared type through an lvalue having a type that is not a non-atomic character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

Where footnote 98 says

Allocated objects have no declared type.

At the line marked (*), we hit the second sentence in the above quotes, and the effective type is fixed to be double for the double object that now lives at the address of ptr+_Alignof(double).

The following code after (*) would be a violation to the strict-aliasing rules, even if the pointer is properly aligned.

int i = *((int *)(ptr+_Alignof(double)));

Writing to the memory through a lvalue of a different type is allowed according to the last sentence of the quote above, and the intent is made quite clear by the explicit mention of "subsequent accesses that do not modify the stored value". This is in fact the basis of how memory pools work by recycling memories earlier used by other objects and reassigning them for objects with new types. So the following is valid, and updates the effective type of the associated memory. (Assuming that the pointer is properly aligned)

*((int *)(ptr+_Alignof(double))) = 42;

However, as @JohnBollinger points out in a comment, many of the common C implementations are deficient with regards to the effective type updates via a write similar to the one above. For such implementations, it is possible that they perform incorrect type-aliasing analysis and optimize the code incorrectly. So despite that the C standard states the above is valid, it is probably wiser not to do it directly. The case for memory pool implementation is different as the code that updates the effective type is often located in a different TU and the incorrect type-based aliasing analysis by such deficient implementations cannot do much harm.

7
dave_thompson_085 On

Simply converting int* to double* (whether by a cast or not) doesn't violate strict-aliasing; that's only if you access (read and/or write) the object using lvalues of different non-character types (generally by dereferencing a pointer for at least one of them) and your example doesn't do that. (You access byte 0 only as char, and bytes 1:16 only as double. How the memory was allocated doesn't matter.

There is a different rule that converting to a more strictly aligned pointer type may produce an invalid pointer value, and dereferencing it may cause UB. Here malloc (and calloc realloc) may differ from a declared variable, because malloc etc return memory that is sufficiently aligned for any type, whereas e.g. int x could allocate memory that is aligned for int but not double. However (char*)malloc() + 1 as your example does is almost certainly unaligned for double just as much as &intx could be. Most implementations today do not require alignment (except possibly for volatile), and if the one(s) you use does(do) not, then your example code is fine, although often unaligned access has slightly lower performance and if you do a LOT of it (like trillions) it may make a noticeable difference.

3
Eric Postpischil On

*((double *)(ptr + 1)) gives us three things to consider:

  1. Does the conversion to double * satisfy requirements, particularly alignment?
  2. Is the conversion to double * meaningful (is the resulting value sufficiently defined)?
  3. Is accessing the memory as a double defined?

1. Does the conversion satisfy requirements?

The type of ptr, and have ptr + 1, is unsigned char *, and C 2018 6.3.2.3 7 says “A pointer to an object type may be converted to a pointer to a different object type…” Since ptr + 1 is a pointer to an object type, it satisfies this. The paragraph continues “… If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined…” As others have pointed out, ptr + 1 may not be correctly aligned for the double type. It is very likely not optimally aligned, but whether it is “correctly” aligned is a matter for each C implementation. Some support accesses with any alignment, albeit with performance costs. (It is also allowed under the C standard, but not implemented in any C implementation I know of, for char and double to be the same size, in which case ptr + 1 would be correctly aligned for a double.)

2. Is the conversion meaningful?

C 2018 6.3.2.3 7 continues “… Otherwise, when converted back again, the result shall compare equal to the original pointer…” This is all the paragraph says about conversions to a pointer-to-object type other than pointers to character types. Note that it only tells you the pointer is useful for converting back to the original type. It does not tell you it is useful for accessing objects in the new type.

For example, given short Array[2]; short *p = (short *) &Array;, p has some value, and it has been converted from the address of Array (not, note well, from the address of Array[0]). But the standard does not tell us the result of the conversion is the address of Array[0]. It could be some value that is meaningless except that converting it back to short (*)[2] yields the address of Array. For example, consider a C implementation that starts all arrays on 64-byte boundaries, because its hardware does a weird thing with forming 22-bit addresses from two 16-bit values (effectively (a << 6) + b), so keeping arrays on 64-byte boundaries means you only need 16 bits to record their addresses, whereas recording an arbitrary char address requires at least 22 bits. (There was hardware like this.) When you convert &Array to short * in this C implementation, it could give you a 22-bit short * value in which the 16 bits of &Array are not in a usable position for addressing a short *. The only use would be for converting it back to short (*)[2].

You will not find this in normal modern C implementations, but it is allowed by the C standard.

Note that if we had this sequence:

void *p = malloc(sizeof (short [2]));
short *pShort = p;
short (*pArray0)[2] = p;
short (*pArray1)[2] = (short (*)[2]) pShort;

It is technically not defined by the standard that pArray0 compares equal to pArray1. malloc is defined to return a pointer that may be assigned to any pointer-to-object type with a fundamental alignment requirement and used to access an object of that type in the allocated space. That tells us pArray0 points to the allocated space and may be used to access an array of short there. But pArray1 was not converted directly from the pointer returned by malloc. It was passed through a short *. It is common to assume this works in normal C implementations, and we can reason it is expected, but the C standard does not explicitly guarantee this.

3. Is accessing the memory as a double defined?

Suppose all of the above are satisfied, so that (double *)(ptr + 1) gives us a correctly aligned pointer to space we may use for a double. Can we access a double there? This is addressed by the aliasing rules in C 2018 6.5 7, which says we may access (read or write) an object using only certain types, notably a type compatible with the effective type (which for double is just double), a qualified version of a compatible type (so we can read a double as a const double), a character type, and certain other types that do not concern us here.

For directly defined object, as in double d;, the effective type is the declared type. For dynamically allocated memory, there is no effective type at first and, whenever you store into it, the effective type is the type you are using. This means that writing into dynamically allocated memory always satisfies the aliasing rules of 6.5 7.

Considering reading the memory, as you do in the printf, requires more consideration. Dynamically allocated memory gains an effective type when (C 2018 6.5 6):

  • You store into it using a non-character type.
  • You copy into it using memcpy, memmove, or character-by-character from an object with an effective type.

This means that reading the memory as a double satisfies the aliasing rules as long as the previous write was as a double. They are also satisfied if the memory has no effective type yet, because, for that case, the effective type of a read is the type used for the read. A problem would arise only if you gave the memory some effective type other than double, as by writing into it through uint64_t, and then tried to read it using the type double.