How should the [u]int_fastN_t types be defined for x86_64, with or without the x32 ABI?

759 views Asked by At

The x32 ABI specifies, among other things, 32-bit pointers for code generated for the x86_64 architecture. It combines the advantages of the x86_64 architecture (including 64-bit CPU registers) with the reduced overhead of 32-bit pointers.

The <stdint.h> header defines typedefs int_fast8_t, int_fast16_t, int_fast32_t, and int_fast64_t (and corresponding unsigned types uint_fast8_t et al), each of which is:

an integer type that is usually fastest to operate with among all integer types that have at least the specified width

with a footnote:

The designated type is not guaranteed to be fastest for all purposes; if the implementation has no clear grounds for choosing one type over another, it will simply pick some integer type satisfying the signedness and width requirements.

(Quoted from the N1570 C11 draft.)

The question is, how should [u]int_fast16_t and [u]int_fast32_t types be defined for the x86_64 architecture, with or without the x32 ABI? Is there an x32 document that specifies these types? Should they be compatible with the 32-bit x86 definitions (both 32 bits) or, since x32 has access to 64-bit CPU registers, should they be the same size with or without the x32 ABI? (Note that the x86_64 has 64-bit registers regardless of whether the x32 ABI is in use or not.)

Here's a test program (which depends on the gcc-specific __x86_64__ macro):

#include <stdio.h>
#include <stdint.h>
#include <limits.h>

int main(void) {
#if defined __x86_64__ && SIZE_MAX == 0xFFFFFFFF
    puts("This is x86_64 with the x32 ABI");
#elif defined __x86_64__ && SIZE_MAX > 0xFFFFFFFF
    puts("This is x86_64 without the x32 ABI");
#else
    puts("This is not x86_64");
#endif
    printf("uint_fast8_t  is %2zu bits\n", CHAR_BIT * sizeof (uint_fast8_t));
    printf("uint_fast16_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast16_t));
    printf("uint_fast32_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast32_t));
    printf("uint_fast64_t is %2zu bits\n", CHAR_BIT * sizeof (uint_fast64_t));
}

When I compile it with gcc -m64, the output is:

This is x86_64 without the x32 ABI
uint_fast8_t  is  8 bits
uint_fast16_t is 64 bits
uint_fast32_t is 64 bits
uint_fast64_t is 64 bits

When I compile it with gcc -mx32, the output is:

This is x86_64 with the x32 ABI
uint_fast8_t  is  8 bits
uint_fast16_t is 32 bits
uint_fast32_t is 32 bits
uint_fast64_t is 64 bits

(which, apart from the first line, matches the output with gcc -m32, which generates 32-bit x86 code).

Is this a bug in glibc (which defines the <stdint.h> header), or is it following some x32 ABI requirement? There are no references to the [u]int_fastN_t types in either the x32 ABI document or the x86_64 ABI document, but there could be something else that specifies it.

One could argue that the fast16 and fast32 types should be 64 bits with or with x32, since 64-bit registers are available; would that makes more sense that the current behavior?

(I've substantially edited the original question, which asked only about the x32 ABI. The question now asks about x86_64 with or without x32.)

3

There are 3 answers

3
gnasher729 On

Tough. Let's just take int_fast8_t. If a developer uses a large array to store lots of 8 bit signed integers, then int8_t will be fastest because of caching. I'd declare that using large arrays of int_fast8_t is likely a bad idea.

You'd need to take a large codebase, and systematically replace int8_t and signed chars and plain char if it is signed with int_fast8_t. Then benchmark the code using different typedefs for int_fast8_t, and measure what's fastest.

Note that undefined behaviour is going to change. For example assigning 255 will give a result of -1 if the type is int8_t and 255 otherwise.

8
rodrigo On

I have compiled the following sample code to check the generated code for a simple sum with different integer types:

#include <stdint.h>

typedef int16_t INT;
//typedef int32_t INT;
//typedef int64_t INT;

INT foo()
{
    volatile INT a = 1, b = 2;
    return a + b;
}

And then I disassembled the code generated with each of the integer types. The compilation command is gcc -Ofast -mx32 -c test.c. Note that in full 64-bit mode the generated code will be almost the same because there are no pointers in my code (only %rsp instead of %esp).

With int16_t it emits:

00000000 <foo>:
   0:   b8 01 00 00 00          mov    $0x1,%eax
   5:   ba 02 00 00 00          mov    $0x2,%edx
   a:   67 66 89 44 24 fc       mov    %ax,-0x4(%esp)
  10:   67 66 89 54 24 fe       mov    %dx,-0x2(%esp)
  16:   67 0f b7 54 24 fc       movzwl -0x4(%esp),%edx
  1c:   67 0f b7 44 24 fe       movzwl -0x2(%esp),%eax
  22:   01 d0                   add    %edx,%eax
  24:   c3                      retq   

With int32_t:

00000000 <foo>:
   0:   67 c7 44 24 f8 01 00 00 00  movl   $0x1,-0x8(%esp)
   9:   67 c7 44 24 fc 02 00 00 00  movl   $0x2,-0x4(%esp)
  12:   67 8b 54 24 f8              mov    -0x8(%esp),%edx
  17:   67 8b 44 24 fc              mov    -0x4(%esp),%eax
  1c:   01 d0                       add    %edx,%eax
  1e:   c3                          retq   

And with int64_t:

00000000 <foo>:
   0:   67 48 c7 44 24 f0 01 00 00 00   movq   $0x1,-0x10(%esp)
   a:   67 48 c7 44 24 f8 02 00 00 00   movq   $0x2,-0x8(%esp)
  14:   67 48 8b 54 24 f0               mov    -0x10(%esp),%rdx
  1a:   67 48 8b 44 24 f8               mov    -0x8(%esp),%rax
  20:   48 01 d0                        add    %rdx,%rax
  23:   c3                              retq   

Now, I don't claim to know exactly why the compiler generated exactly this code (maybe the volatile keyword combined with a non-register-size integer type is not the best choice?). But from that generated code we can draw the following conclusions:

  1. The slowest type is int16_t. It needs additional instructions to move the values around.
  2. The fastest type is int32_t. Although the 32-bit and the 64-bit versions have the same number of instructions, the 32-bit code is shorter in bytes, so it will be more cache friendly, so faster.

So the natural choices for the fast types would be:

  1. For int_fast16_t, choose int32_t.
  2. For int_fast32_t, choose int32_t.
  3. For int_fast64_t, choose int64_t (what else).
3
Ross Ridge On

Generally speaking you would expect 32-bit integer types to be marginally faster than 64-bit integer types on x86-64 CPUs. Partly because they use less memory, but also because 64-bit instructions require an extra prefix byte over their 32-bit counterparts. The 32-bit division instruction is significantly faster than 64-bit one, but otherwise instruction execution latencies are the same.

It isn't normally necessary to extend 32-bit when loading them into 64-bit registers. While the CPU automatically zero-extends the values in this case, this is usually only a benefit because it avoids partial register stalls. What gets loaded into upper part of the register is less important than the fact that the entire register is modified. The contents of the upper part of the register don't matter because when they're used to hold 32-bit types they're normally only used with 32-bit instructions that only work with the lower 32-bit part of the register.

The inconsistency between between the sizes of int_fast32_t types when using the x32 and x86-64 ABIs is probably best justified by the fact that pointers are 64 bits wide. Whenever a 32-bit integer is added to a pointer it would need to be extended, making this a much more likely occurrence when using the x86-64 ABI.

Another factor to consider is that whole point of the x32 ABI is to get better performance by using smaller types. Any application that benefits from pointers and related types being smaller should also benefit from int_fast32_t being smaller as well.