why does a integer type need to be little-endian?

1.2k views Asked by At

I am curious about little-endian and I know that computers almost have little-endian method.

So, I praticed through a program and the source is below.

int main(){

int flag = 31337;
char c[10] = "abcde";
int flag2 = 31337;

return 0;

}

when I saw the stack via gdb,

I noticed that there were 0x00007a69 0x00007a69 .... ... ... .. .... ... 0x62610000 0x00656463 .. ...

So, I have two questions.

For one thing,

how can the value of char c[10] be under the flag?

I expected there were the value of flag2 in the top of stack and the value of char c[10] under the flag2 and the value of flag under the char c[10].

like this

7a69
"abcde"
7a69

Second,

I expected the value were stored in the way of little-endian.

As a result, the value of "abcde" was stored '6564636261'

However, the value of 31337 wasn't stored via little-endian.

It was just '7a69'. I thought it should be '697a'

why doesn't integer type conform little-endian?

4

There are 4 answers

0
fferri On

GDB shows you 0x62610000 0x00656463 because it is interpreting data (...abcde...) as 32bit words on a little endian system.

It could be either way, but the reasonable default is to use native endianness.

Data in memory is just a sequence of bytes. If you tell it to show it as a sequence (array) of short ints, it changes what it displays. Many debuggers have advanced memory view features to show memory content in various interpretations, including string, int (hex), int (decimal), float, and many more.

2
Sami Kuhmonen On

There is some confusion in your understanding of endianness, stack and compilers.

First, the locations of variables in the stack may not have anything to do with the code written. The compiler is free to move them around how it wants, unless it is a part of a struct, for example. Usually they try to make as efficient use of memory as possible, so this is needed. For example having char, int, char, int would require 16 bytes (on a 32bit machine), whereas int, int, char, char would require only 12 bytes.

Second, there is no "endianness" in char arrays. They are just that: arrays of values. If you put "abcde" there, the values have to be in that order. If you would use for example UTF16 then endianness would come into play, since then one part of the codeword (not necessarily one character) would require two bytes (on a normal 8-bit machine). These would be stored depending on endianness.

Decimal value 31337 is 0x007a69 in 32bit hexadecimal. If you ask a debugger to show it, it will show it as such whatever the endianness. The only way to see how it is in memory is to dump it as bytes. Then it would be 0x69 0x7a 0x00 0x00 in little endian.

Also, even though little endian is very popular, it's mainly because x86 hardware is popular. Many processors have used big endian (SPARC, PowerPC, MIPS amongst others) order and some (like older ARM processors) could run in either one, depending on the requirements.

There is also a term "network byte order", which actually is big endian. This relates to times before little endian machines became most popular.

2
Clifford On

Integer byte order is an arbitrary processor design decision. Why for example do you appear to be uncomfortable with little-endian? What makes big-endian a better choice?

Well probably because you are a human used to reading numbers from left-to-right; but the machine hardly cares.

There is in fact a reasonable argument that it is intuitive for the least-significant-byte to be placed in the lowest order address; but again, only from a human intuition point-of-view.

0
Pynchia On

You got a few excellent answers already. Here is a little code to help you understand how variables are laid out in memory, either using little-endian or big-endian:

#include <stdio.h>

void show_var(char* varname, unsigned char *ptr, size_t size) {
 int i;
 printf ("%s:\n", varname);
 for (i=0; i<size; i++) {
     printf("pos %d = %2.2x\n", i, *ptr++);
 }
 printf("--------\n");
}

int main() {
 int flag = 31337;
 char c[10] = "abcde";

 show_var("flag", (unsigned char*)&flag, sizeof(flag));
 show_var("c", (unsigned char*)c, sizeof(c));
}

On my Intel i5 Linux machine it produces:

flag:
pos 0 = 69
pos 1 = 7a
pos 2 = 00
pos 3 = 00
--------
c:
pos 0 = 61
pos 1 = 62
pos 2 = 63
pos 3 = 64
pos 4 = 65
pos 5 = 00
pos 6 = 00
pos 7 = 00
pos 8 = 00
pos 9 = 00
--------