Why does gcc -O1 optimization break this code modifying VRAM in a loop for a Gameboy Advance ROM?

108 views Asked by At

I'm working on a simple Gameboy Advance ROM and trying to understand why the following code works with the gcc -O0 option, but crashes (white emulator screen) with -O1 or above:

int main () {
    // Set video mode 3 and background 2
    *(unsigned int*)0x04000000 = 0x0403;

    int x;
    for(x = 0; x < 1; x++){
        // Set a single pixel at position (120, 80) in VRAM to red
        ((unsigned short*)0x06000000)[120+80*240] = 0x001F;
    };

    while(1);

    return 0;
}

The loop is obviously unnecessary, but I used it to create a minimal example of the crashing behavior. Without the loop, the code works properly regardless of optimization level. With the loop, it crashes on -O1 and higher, but works on -O0.

The same behavior occurs whether or not I'm actually using x in the body of the loop (for example, using x to calculate a pixel position). As far as I can tell, I get the breakage on higher optimization levels any time I try to do this type of direct memory modification in a loop.

What's going on here? What optimization is breaking the code? And does it point to some issue in the way I'm doing things? Thanks for any help!

A few more details:

  • I'm using devkitpro/devkitarm
  • I'm using NanoBoyAdvance as the emulator
  • I'm running on Ubuntu
  • Full command:
arm-none-eabi-gcc -MMD -MP -MF /path/to/myfile.d  -g -Wall -O1 -mcpu=arm7tdmi -mtune=arm7tdmi -mthumb -mthumb-interwork -iquote /path/to/include -I/opt/devkitpro/libgba/include -I/path/to/build -c /path/to/source/myfile.c -o myfile.o
1

There are 1 answers

1
Nate Eldredge On BEST ANSWER

Writes to memory-mapped hardware generally need to be done through pointers to volatile types, e.g. *(volatile unsigned int*)0x04000000 = 0x0403;. Otherwise the compiler is free to assume they have no side effects and can be optimized in unexpected ways. Here, this applies both to the store which sets the video mode, and the one which puts the pixel.

(Actually, when writing to video memory, you may be okay with allowing some optimization; e.g. you don't care if all the writes happen in exactly the specified order, so long as they all happen eventually. In that case you can often get away with non-volatile stores together with some sort of compiler-specific memory barrier that acts as though it may observe them, e.g. asm("" : : : "memory"); in gcc).

What's happened in this case is that the compiler optimized out the first store, which sets the video mode: https://godbolt.org/z/4dn9dhh6j

main:
        ldr     r3, .L3
        mov     r2, #31
        strh    r2, [r3, #240]  @ movhi
.L2:
        b       .L2
.L3:
        .word   100701696

I'm not an expert on the details of gcc's optimizer, but my assumption is that the compiler sees that there are no reads from memory anywhere between the store and the infinite loop. Therefore, it assumes that no code in the program can read back the value you wrote to address 0x04000000, since nothing that follows the infinite loop (including code outside of main) can ever execute. If the value can never be read, and the write itself does not have a side effect (assumed because it's not volatile), then it's a dead store and there was no need to write it in the first place.

However, by that logic one would think it would also have optimized out the store that puts the pixel, but it did not. So maybe there is something else going on that I am missing.