On an 8-bit platform, I am composing an unsigned 32-bit integer from 4 8-bit integers like this:
uint8_t buf[4];
uint32_t large = 0;
large |= ((uint32_t)buf[0]) << 24;
large |= ((uint32_t)buf[1]) << 16;
large |= buf[2] << 8;
large |= buf[3] << 0;
Without the casts the compiler understandably complains:
bmp.c:100:23: warning: left shift count >= width of type [-Wshift-count-overflow]
100 | large |= (buf[1]) << 16;
| ^~
Are these casts expensive (I would guess yes) and can this be done more efficiently?
Here is what I think is the relevant disassembly from avr-gcc (GCC) 13.2.0:
000060ee <.L29>:
large |= ((uint32_t)buf[1]) << 16;
60ee: 91 2c mov r9, r1
60f0: a1 2c mov r10, r1
60f2: b1 2c mov r11, r1
000060f4 <.Loc.91>:
large |= buf[3] << 0;
60f4: a9 2a or r10, r25
000060f6 <.Loc.92>:
large |= buf[2] << 8;
60f6: 50 e0 ldi r21, 0x00 ; 0
000060f8 <.Loc.93>:
60f8: 54 2f mov r21, r20
60fa: 44 27 eor r20, r20
60fc: 05 2e mov r0, r21
60fe: 00 0c add r0, r0
6100: 66 0b sbc r22, r22
6102: 77 0b sbc r23, r23
00006104 <.Loc.94>:
large |= buf[3] << 0;
6104: 84 2a or r8, r20
6106: 95 2a or r9, r21
6108: a6 2a or r10, r22
610a: b7 2a or r11, r23
610c: b8 2a or r11, r24
610e: 80 92 04 01 sts 0x0104, r8 ; 0x800104 <large>
6112: 90 92 05 01 sts 0x0105, r9 ; 0x800105 <large+0x1>
6116: a0 92 06 01 sts 0x0106, r10 ; 0x800106 <large+0x2>
611a: b0 92 07 01 sts 0x0107, r11 ; 0x800107 <large+0x3>
No - it is a problem of the undefined/implemetation defined behaviour in the code, when it is written correctly it does not matter. I would also suggest using pointer notation in parameters (as C passes arrays as pointers) and declare parameters as
constif function is not changing them. It helps compiler with optimizations (even abstracting from const correctness)Both generate the same machine code:
https://godbolt.org/z/b7o4114EP
Also AVR compiler assumes little endian and you "composing" the uint32_t number from the big-endian representation.
If endianness match then I would suggest using
memcpyOptimizing compilers will not call
memcpyBut to more interesting using
unions makes code much more efficient if thebufdata is big endianand the resulting code: