Clarification about Bit-field ordering semantics in C

4.8k views Asked by At

I have troubles understanding the exact meaning of a paragraph of C99 draft standard (N1256) about bit-fields (6.7.2.1:10):

6.7.2.1 Structure and union specifiers

[...]

Semantics

[...]

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

The emphasized sentence stretches my English skills to the limit: I don't understand if it refers to individual bit-fields inside a unit, or to bits ordering inside the individual bit-fields or something else.

I'll try to make my doubt clearer with an example. Let's assume that unsigned ints are 16 bits, that the implementation chooses an unsigned int as the addressable storage unit (and that bytes are 8 bits wide), and no other alignment or padding issues arise:

struct Foo {
    unsigned int x : 8;
    unsigned int y : 8;
};

thus, assuming x and y fields are stored inside the same unit, what is implementation-defined according to that sentence? As I understand it, it means that inside that unsigned int unit, x can be stored either at a lower address than y or vice-versa, but I'm not sure, since intuitively I'd think that if no bit fields overlaps with two underlying storage units, the declaration order would impose the same ordering for the underlying bit-fields.

Note: I fear I'm missing some terminology subtlety here (or, worse, some technical one), but I couldn't understand which.

Any pointer appreciated. Thanks!

3

There are 3 answers

4
Jens Gustedt On BEST ANSWER

I don't really see what is unclear with

The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined.

It talks about the allocation of a bit-field, and not the bits inside a field. So other than for non-bit-field members, you can't be sure in what order bit-fields inside an addressable unit are ordered.

Otherwise the representation of the bit-field itself is guaranteed to be "the same" as the underlying type, with a division into value bits and a sign bit (if applicable).

In essence it says that the anatomy of the storage unit that contains the bit-fields is implementation defined, and you shouldn't try to access the bits through other means (union or so) since this would make your code non-portable.

0
Gibbon1 On

My take on it is, the C99 spec is talking about the bit endian of the bits fields, and how they are ordered in a 'unit' (byte word, etc). Essentially you're on your own if you start casting structs.

Example

bit  ex1    ex2   ex3
D7   x3     y0    x0
D6   x2     y1    x1
D5   x1     y2    x2
D4   x0     y3    x3
D3   y3     x0    y0
D2   y2     x1    y1
D1   y1     x2    y2
D0   y0     x3    y3

Above three different schemes for ordering the two 4 bit fields in byte 'unit'. All of them are legal as far as the C99 standard is concerned.

0
Adam Haun On

Gibbon1's answer is correct, but I think example code is helpful for this sort of question.

#include <stdio.h>

int main(void)
{
    union {
        unsigned int x;
        struct {
            unsigned int a : 1;
            unsigned int b : 10;
            unsigned int c : 20;
            unsigned int d : 1;
        } bits;
    } u;
    
    u.x = 0x00000000;
    u.bits.a = 1;
    printf("After changing a: 0x%08x\n", u.x);
    u.x = 0x00000000;
    u.bits.b = 1;
    printf("After changing b: 0x%08x\n", u.x);
    u.x = 0x00000000;
    u.bits.c = 1;
    printf("After changing c: 0x%08x\n", u.x);
    u.x = 0x00000000;
    u.bits.d = 1;
    printf("After changing d: 0x%08x\n", u.x);
    
    return 0;
}

On a little-endian x86-64 CPU using MinGW's GCC, the output is:

After changing a: 0x00000001

After changing b: 0x00000002

After changing c: 0x00000800

After changing d: 0x80000000

Since this is a union, the unsigned int (x) and the bit field structure (a/b/c/d) occupy the same storage unit. The order of allocation of [the] bit fields decides whether u.bits.a refers to the least significant bit of x or the most significant bit of x. Typically, on a little-endian machine:

u.bits.a == (u.x & 0x00000001)
u.bits.b == (u.x & 0x000007fe) >> 1
u.bits.c == (u.x & 0xeffff800) >> 11
u.bits.d == (u.x & 0x80000000) >> 31

and on a big-endian machine:

u.bits.a == (u.x & 0x80000000) >> 31
u.bits.b == (u.x & 0x7fe00000) >> 21
u.bits.c == (u.x & 0x001ffffe) >> 1
u.bits.d == (u.x & 0x00000001)

What the standard is saying is that the C programming language does not require any particular endianness -- big-endian and little-endian machines can put data in the order that is most natural for their addressing scheme.