Yesterday, someone showed me this code:
#include <stdio.h>
int main(void)
{
unsigned long foo = 506097522914230528;
for (int i = 0; i < sizeof(unsigned long); ++i)
printf("%u ", *(((unsigned char *) &foo) + i));
putchar('\n');
return 0;
}
That results in:
0 1 2 3 4 5 6 7
I am very confused, mainly with the line in the for
loop. From what I can tell, it seems like &foo
is being cast to an unsigned char *
and then being added by i
. I think *(((unsigned char *) &foo) + i)
is a more verbose way of writing ((unsigned char *) &foo)[i]
, but this makes it seem like foo
, an unsigned long
is being indexed. If so, why? The rest of the loop seems typical to printing all elements of an array, so everything seems to point to this being true. The cast to unsigned char *
is further confusing me. I tried searching about casting integer types to char *
specifically on google, but my research got stuck after some unhelpful search results about casting int
to char
, itoa()
, etc. 506097522914230528
specifically prints out 0 1 2 3 4 5 6 7
, but other numbers appear to have their own unique 8 numbers shown in the output, and bigger numbers seem to fill in more zeroes.
As a preface, this program will not necessarily run exactly like how it does in the question as it exhibits implementation-defined behavior. In addition to this, tweaking the program slightly can cause undefined behavior as well. More information on this at the end.
The first line of the
main
function defines anunsigned long foo
as506097522914230528
. This seems confusing at first, but in hexadecimal it looks like this:0x0706050403020100
.This number consists of the following bytes:
0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
. By now, you can probably see its relation to the output. If you're still confused about how this translates into the output, take a look at the for loop.Assuming a
long
is 8 bytes long, this loop runs eight times (remember, two hex digits are enough to display all possible values of a byte, and since there are 16 digits in the hex number, the result is 8, so the for loop runs eight times). Now the real confusing part is the second line. Think about it this way: as I previously mentioned, two hex digits can show all possible values of a byte, right? So then if we could isolate the last two digits of this number, we would get a byte value of seven! Now, assume thelong
is actually an array which looks like this:We get the address of
foo
with&foo
, cast it to anunsigned char *
to isolate two digits, then use pointer arithmetic to basically getfoo[i]
iffoo
is an array of eight bytes. As I mentioned in my question, this probably looks less confusing as((unsigned char *) &foo)[i]
.A bit of a warning: This program exhibits implementation-defined behavior. This means that this program will not necessarily work the same way/give the same output for all implementations of C. Not only is a long 32 bits in some implementations, but when we declare the
unsigned long
, the way/order in which it stores the bytes of0x0706050403020100
(AKA endianness) is also implementation-defined. Credit to @philipxy for pointing out the implementation-defined behavior first. This type punning causes another issue which @Ruslan pointed out, which is that, if thelong
is casted to anything other than achar *
/unsigned char *
, C's strict aliasing rule comes into play and you will get undefined behavior (Credit of the link goes to @Ruslan as well). More detail on these two points in the comment section.