Endianness & Storing Characters into Unsigned Integers

85 views Asked by At

I am initializing a symlink in an ext2 inode (school assignment).

I got the idea to do it in hex since the field is defined as uint32_t i_block[EXT2_N_BLOCKS].

As an example:

#include <stdio.h>

int main () {
  // unsigned int is 32 bytes on my system
  unsigned int i = 0x68656c6c; // hell
  printf("%.*s\n", 4, &i");

I got the output

lleh

Is this because my system is little-endian? Does that mean if I hardcode the opposite order, it would not port to big-endian systems (my eventual goal is hello-world)?

What is the best, most simple way to store a character string into an array of unsigned integers?

2

There are 2 answers

5
Lundin On

Is this because my system is little-endian?

Yes.

Does that mean if I hardcode the opposite order, it would not port to big-endian systems

Code relying on the byte order of integers is non-portable indeed.

What is the best, most simple way to store a character string into an array of unsigned integers?

The best way is not to use integers at all but char, which unlike integers does not depend on endianess and was actually designed for the purpose of storing characters.

You could ignore that it is an integer type and just memcpy a string into it:

unsigned int i;
memcpy(&i, "hell", 4);

Or if you prefer: memcpy(&i, "\x68\x65\x6c\x6c", 4);.

Otherwise you'll have to invent some ugly hack like for example:

#define LITTLE_ENDIAN  (*(unsigned char*) &(int){0xAA} == 0xAA)
unsigned int i = LITTLE_ENDIAN ? 0x6c6c6568 : 0x68656c6c;
9
chux - Reinstate Monica On

Strictly speaking, printf("%.*s\n", 4, &i"); is undefined behavior (UB) as "%.s" expects a pointer to a character and &i is a pointer to an int.

A better alternative uses a union.

union {
  unsigned u;
  unsigned char uc[sizeof (unsigned)];
} x = { .u = 0x68656c6c};

printf("%.*s\n", (int) sizeof x.uc, x.uc);

Even better, use uint32_t instead of unsigned.


What is the best, most simple way to store a character string into an array of unsigned integers?

Avoid all endian concerns via a union and initialize via the .uc member.

#include <stdio.h>
#define N 42

int main(void) {
  union {
    unsigned u[N];
    unsigned char uc[sizeof (unsigned[N])];
  } x = { .uc = "Hello"};
  printf("<%.*s>\n", (int) sizeof x.uc, x.uc);
}

Output

<Hello>

Note that .uc[] might not be a string with a long enough initializer as it may lack a null character.