Indexing an `unsigned long` variable and printing the result

1.8k views Asked by At

Yesterday, someone showed me this code:

#include <stdio.h>

int main(void)
{
    unsigned long foo = 506097522914230528;
    for (int i = 0; i < sizeof(unsigned long); ++i)
        printf("%u ", *(((unsigned char *) &foo) + i));
    putchar('\n');

    return 0;
}

That results in:

0 1 2 3 4 5 6 7

I am very confused, mainly with the line in the for loop. From what I can tell, it seems like &foo is being cast to an unsigned char * and then being added by i. I think *(((unsigned char *) &foo) + i) is a more verbose way of writing ((unsigned char *) &foo)[i], but this makes it seem like foo, an unsigned long is being indexed. If so, why? The rest of the loop seems typical to printing all elements of an array, so everything seems to point to this being true. The cast to unsigned char * is further confusing me. I tried searching about casting integer types to char * specifically on google, but my research got stuck after some unhelpful search results about casting int to char, itoa(), etc. 506097522914230528 specifically prints out 0 1 2 3 4 5 6 7, but other numbers appear to have their own unique 8 numbers shown in the output, and bigger numbers seem to fill in more zeroes.

2

There are 2 answers

21
mediocrevegetable1 On

As a preface, this program will not necessarily run exactly like how it does in the question as it exhibits implementation-defined behavior. In addition to this, tweaking the program slightly can cause undefined behavior as well. More information on this at the end.

The first line of the main function defines an unsigned long foo as 506097522914230528. This seems confusing at first, but in hexadecimal it looks like this: 0x0706050403020100.

This number consists of the following bytes: 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00. By now, you can probably see its relation to the output. If you're still confused about how this translates into the output, take a look at the for loop.

for (int i = 0; i < sizeof(unsigned long); ++i)
        printf("%u ", *(((unsigned char *) &foo) + i));

Assuming a long is 8 bytes long, this loop runs eight times (remember, two hex digits are enough to display all possible values of a byte, and since there are 16 digits in the hex number, the result is 8, so the for loop runs eight times). Now the real confusing part is the second line. Think about it this way: as I previously mentioned, two hex digits can show all possible values of a byte, right? So then if we could isolate the last two digits of this number, we would get a byte value of seven! Now, assume the long is actually an array which looks like this:

{00, 01, 02, 03, 04, 05, 06, 07}

We get the address of foo with &foo, cast it to an unsigned char * to isolate two digits, then use pointer arithmetic to basically get foo[i] if foo is an array of eight bytes. As I mentioned in my question, this probably looks less confusing as ((unsigned char *) &foo)[i].


A bit of a warning: This program exhibits implementation-defined behavior. This means that this program will not necessarily work the same way/give the same output for all implementations of C. Not only is a long 32 bits in some implementations, but when we declare the unsigned long, the way/order in which it stores the bytes of 0x0706050403020100 (AKA endianness) is also implementation-defined. Credit to @philipxy for pointing out the implementation-defined behavior first. This type punning causes another issue which @Ruslan pointed out, which is that, if the long is casted to anything other than a char */unsigned char *, C's strict aliasing rule comes into play and you will get undefined behavior (Credit of the link goes to @Ruslan as well). More detail on these two points in the comment section.

2
Lundin On

There's already an answer explaining what the code does, but since this post for some reason is getting a lot of strange attention and getting repeatedly closed for the wrong reasons, here's some more insights on what the code does, what C guarantees and what it does not guarantee:


  • unsigned long foo = 506097522914230528;. This integer constant is 506 * 10^15 large. That one may or may not fit inside an unsigned long, depending on if long is 4 or 8 byte large on your system (implementation-defined).

    In case of 4 byte long, this will get truncated to 0x03020100 1).

    In case of 8 byte long, it can handle numbers up to 18.44 * 10^18 so the value will fit.

  • ((unsigned char *) &foo) is a valid pointer conversion and well-defined behavior. C17 6.3.2.3/7 makes this guarantee:

    A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

    The concern about alignment does not apply since we have a pointer to character.

    If we keep reading 6.3.2.3/7:

    When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

    This is a special rule allowing us to inspect any type in C through a character type. Whether the successive increments is done by a pointer++ or by pointer arithmetic pointer + i doesn't matter. As long as we keep pointing within the inspected object, which i < sizeof(unsigned long) ensures. This is well-defined behavior.

  • Another special rule "strict aliasing" that was mentioned contains a similar exception for characters. It is in sync with the 6.3.2.3/7 rule. Specifically, "strict aliasing" allows (C17 6.5/7):

    An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
    ...

    • a character type.

    The "stored object" in this case is unsigned long and should normally only get accessed as such. However, when the unsigned char* is de-referenced with * we access it as a character type. This is allowed by the exception to the strict aliasing rule mentioned above.

    As a side note, the other way around, accessing an array of unsigned char arr[sizeof(long)] through an *(unsigned long*)arr lvalue access would have been a strict aliasing violation and undefined behavior. But this is not the case here.

  • Using %u to print a character is strictly speaking not correct since printf then expects an unsigned int. However, since printf is a variadic function, it comes with some oddball implicit promotion rules that makes this code well-defined. The unsigned char value will get promoted by the default argument promotions 2) to type int. printf then internally re-interprets this int as unsigned int. It can't be a negative value because we started from unsigned char. The conversion3) is well-defined and portable.

  • So we get the byte values one by one. The hex representation is 07 06 05 04 03 02 01 00 but how this is stored in an unsigned long is CPU specific/implemention-defined behavior. Which in turn is a very common FAQ, see What is CPU endianness? which contains a very similar example to this code.

    On little endian it will print 1 2..., on big endian it will print 7 6....


1) See the unsigned integer conversion rule C17 6.3.1.3/2.
2) C17 6.5.2.2/6.
3) C17 6.3.1.3/1 "When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged."