My English skill is poor because I'm not a native English speaker. Please understand.
I compiled some test code that operating alright in IAR with LLVM infra but the generated code was not operated on my test board. Detail are as follows.
Test goal
I want to see operating the assembly code that generated with LLVM.
Test environment
- MCU : STM32L152VD (Cortex M3)
- IDE : IAR 8.2
- Debugger : Segger JLink
- LLVM Site : http://ellcc.org/demo/index.cgi
Test step (summary)
- Create test code that operating alright in IAR.
- Move test code to http://ellcc.org/demo/index.cgi and compile after select Target.
- Create test.s file with the generated assembly code.
- Create makefile to generate a bin file and execute makefile with make program.
- Load bin file to target board with JLink program.
Step 1
I wrote simple code without library as below. This code turn on the LED simply.
volatile int* _RCC = (int*)(0x40023800);
volatile int* _RCC_AHBENR = (int*)(0x4002381c);
volatile int* _GPIOE = (int*)0x40021000;
volatile int* _GPIOE_BSRR = (int*)(0x40021000 + 0x18);
void InitPort()
{
const int _RCC_AHBENR_GPIOEEN = (0x00000010);
int SetOutput = 0x00000600;
*_RCC_AHBENR = _RCC_AHBENR_GPIOEEN;
*_GPIOE = SetOutput; // set mode to output
*_GPIOE_BSRR = 0x00000020; // set
}
int main()
{
InitPort();
*_GPIOE_BSRR = 0x00200000; // reset
while(1);
}
The above code is operate right in IAR.
Step 2
I moved created test code to http://ellcc.org/demo/index.cgi and pressed compile button after select options as below.
Step 3
I created test.s file with assembly code that generated in site as below.
.text
.syntax unified
.eabi_attribute 67, "2.09"
.cpu cortex-m3
.eabi_attribute 6, 10
.eabi_attribute 7, 77
.eabi_attribute 8, 0
.eabi_attribute 9, 2
.eabi_attribute 34, 1
.eabi_attribute 17, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 38, 1
.eabi_attribute 18, 4
.eabi_attribute 26, 2
.eabi_attribute 14, 0
.file "_2376_0.c"
.globl InitPort
.p2align 1
.type InitPort,%function
.code 16
.thumb_func
InitPort:
.fnstart
sub sp, #8
movs r0, #16
str r0, [sp, #4]
mov.w r1, #1536
str r1, [sp]
movw r1, :lower16:_RCC_AHBENR
movt r1, :upper16:_RCC_AHBENR
ldr r1, [r1]
str r0, [r1]
ldr r0, [sp]
movw r1, :lower16:_GPIOE
movt r1, :upper16:_GPIOE
ldr r1, [r1]
str r0, [r1]
movw r0, :lower16:_GPIOE_BSRR
movt r0, :upper16:_GPIOE_BSRR
ldr r0, [r0]
movs r1, #32
str r1, [r0]
add sp, #8
bx lr
.Lfunc_end0:
.size InitPort, .Lfunc_end0-InitPort
.cantunwind
.fnend
.globl main
.p2align 1
.type main,%function
.code 16
.thumb_func
main:
.fnstart
push {r7, lr}
mov r7, sp
sub sp, #8
movs r0, #0
str r0, [sp, #4]
bl InitPort
movw r0, :lower16:_GPIOE_BSRR
movt r0, :upper16:_GPIOE_BSRR
ldr r0, [r0]
mov.w lr, #2097152
str.w lr, [r0]
b .LBB1_1
.LBB1_1:
b .LBB1_1
.Lfunc_end1:
.size main, .Lfunc_end1-main
.cantunwind
.fnend
.type _RCC,%object
.data
.globl _RCC
.p2align 2
_RCC:
.long 1073887232
.size _RCC, 4
.type _RCC_AHBENR,%object
.globl _RCC_AHBENR
.p2align 2
_RCC_AHBENR:
.long 1073887260
.size _RCC_AHBENR, 4
.type _GPIOE,%object
.globl _GPIOE
.p2align 2
_GPIOE:
.long 1073876992
.size _GPIOE, 4
.type _GPIOE_BSRR,%object
.globl _GPIOE_BSRR
.p2align 2
_GPIOE_BSRR:
.long 1073877016
.size _GPIOE_BSRR, 4
.ident "ecc version 2017-08-23 (http://ellcc.org) based on clang version 6.0.0 (trunk 311547)"
.section ".note.GNU-stack","",%progbits
Step 4
I created the makefile to generate bin file as below. This is a contents of makefile.
bin: test.s
@echo "Running target all"
arm-none-eabi-as c:/backend/files/test.s -o c:/backend/files/test.o
arm-none-eabi-ld -Ttext=0x08000000 c:/backend/files/test.o -o c:/backend/files/test.elf
arm-none-eabi-objdump -D c:/backend/files/test.elf
arm-none-eabi-objcopy c:/backend/files/test.elf -O binary c:/backend/files/test.bin
clean:
@echo "Running target clean"
rm -f *.o
rm -f *.elf
rm -f *.bin
I executed the above makefile with make program and I got a test.o, test.elf, test.bin files.
Step 5
I loaded bin file with JLink.exe (seggar) and executed using go command but noting happened on board. (The command that I used when load bin file to board is "loadbin C:\backend\files\test.bin, 0x08000000")
Conclusion
To here is everything that I did. I did as above but The assembly code that generated by LLVM infra was not operated unlike the code that generated by IAR. I want to know what I did wrong and how to solve to achieve to goal. Any help would be greatly appreciated.
Thanks.
Addition Information
Nothing software on board like RTOS. The below image is whole structure that I used to test. Only main.cpp file is source code. Other files was generated by EWARM IDE.
The content of map file is as below.
###############################################################################
#
# IAR ELF Linker V8.22.2.15995/W32 for ARM 24/Oct/2020 19:22:32
# Copyright 2007-2018 IAR Systems AB.
#
# Output file = C:\Users\jjw\Desktop\hobby\Test\Debug\Exe\Test.out
# Map file = C:\Users\jjw\Desktop\hobby\Test\Debug\List\Test.map
# Command line =
# -f C:\Users\jjw\AppData\Local\Temp\EW7E50.tmp
# (C:\Users\jjw\Desktop\hobby\Test\Debug\Obj\main.o -o
# C:\Users\jjw\Desktop\hobby\Test\Debug\Exe\Test.out --redirect
# _Printf=_PrintfFullNoMb --redirect _Scanf=_ScanfFullNoMb --map
# C:\Users\jjw\Desktop\hobby\Test\Debug\List\Test.map --config
# "C:\Program Files (x86)\IAR Systems\Embedded Workbench
# 8.0\arm\CONFIG\generic_cortex.icf" --semihosting --entry
# __iar_program_start --redirect __iar_sh_stdout=__iar_sh_stdout_swo
# --vfe --text_out locale)
#
###############################################################################
*******************************************************************************
*** RUNTIME MODEL ATTRIBUTES
***
CppFlavor = *
__CPP_Exceptions = Disabled
__CPP_Language = C++14
__SystemLibrary = DLib
__dlib_version = 6
*******************************************************************************
*** HEAP SELECTION
***
The basic heap was selected because no calls to memory allocation
functions were found in the application outside of system library
functions, and there are calls to deallocation functions in the
application.
*******************************************************************************
*** PLACEMENT SUMMARY
***
"A0": place at 0x00000000 { ro section .intvec };
"P1": place in [from 0x00000000 to 0x0007ffff] { ro };
define block CSTACK with size = 1K, alignment = 8 { };
define block PROC_STACK with size = 0M, alignment = 8 { };
define block HEAP with size = 2K, alignment = 8 { };
"P2": place in [from 0x20000000 to 0x2000ffff] {
rw, block CSTACK, block PROC_STACK, block HEAP };
initialize by copy { rw };
Section Kind Address Size Object
------- ---- ------- ---- ------
"A0": 0x40
.intvec ro code 0x00000000 0x40 vector_table_M.o [4]
- 0x00000040 0x40
"P1": 0x104
.text ro code 0x00000040 0x3c main.o [1]
.text ro code 0x0000007c 0x2c copy_init3.o [4]
.text ro code 0x000000a8 0x28 data_init.o [4]
.iar.init_table const 0x000000d0 0x14 - Linker created -
.text ro code 0x000000e4 0x1e cmain.o [4]
.text ro code 0x00000102 0x4 low_level_init.o [3]
.text ro code 0x00000106 0x4 exit.o [3]
.text ro code 0x0000010a 0x2 vector_table_M.o [4]
.text ro code 0x0000010c 0xa cexit.o [4]
.rodata const 0x00000116 0x1 unwind_debug.o [5]
.text ro code 0x00000118 0x14 exit.o [5]
.text ro code 0x0000012c 0xc cstartup_M.o [4]
Initializer bytes const 0x00000138 0xc <for P2-1>
.rodata const 0x00000144 0x0 copy_init3.o [4]
- 0x00000144 0x104
"P2", part 1 of 2: 0xc
P2-1 0x20000000 0xc <Init block>
.data inited 0x20000000 0x4 main.o [1]
.data inited 0x20000004 0x4 main.o [1]
.data inited 0x20000008 0x4 main.o [1]
- 0x2000000c 0xc
"P2", part 2 of 2: 0x400
CSTACK 0x20000010 0x400 <Block>
CSTACK uninit 0x20000010 0x400 <Block tail>
- 0x20000410 0x400
*******************************************************************************
*** INIT TABLE
***
Address Size
------- ----
Copy (__iar_copy_init3)
1 source range, total size 0xc:
0x00000138 0xc
1 destination range, total size 0xc:
0x20000000 0xc
*******************************************************************************
*** MODULE SUMMARY
***
Module ro code ro data rw data
------ ------- ------- -------
C:\Users\jjw\Desktop\hobby\Test\Debug\Obj: [1]
main.o 60 12 12
-------------------------------------------
Total: 60 12 12
command line: [2]
-------------------------------------------
Total:
dl7M_tln.a: [3]
exit.o 4
low_level_init.o 4
-------------------------------------------
Total: 8
rt7M_tl.a: [4]
cexit.o 10
cmain.o 30
copy_init3.o 44
cstartup_M.o 12
data_init.o 40
vector_table_M.o 66
-------------------------------------------
Total: 202
shb_l.a: [5]
exit.o 20
unwind_debug.o 1
-------------------------------------------
Total: 20 1
Gaps 1
Linker created 20 1 024
-----------------------------------------------
Grand Total: 291 33 1 036
*******************************************************************************
*** ENTRY LIST
***
Entry Address Size Type Object
----- ------- ---- ---- ------
.iar.init_table$$Base 0x000000d0 -- Gb - Linker created -
.iar.init_table$$Limit 0x000000e4 -- Gb - Linker created -
?main 0x000000e5 Code Gb cmain.o [4]
CSTACK$$Base 0x20000010 -- Gb - Linker created -
CSTACK$$Limit 0x20000410 -- Gb - Linker created -
InitPort() 0x00000041 0x1e Code Gb main.o [1]
Region$$Table$$Base 0x000000d0 -- Gb - Linker created -
Region$$Table$$Limit 0x000000e4 -- Gb - Linker created -
_GPIOE 0x20000004 0x4 Data Gb main.o [1]
_GPIOE_BSRR 0x20000008 0x4 Data Gb main.o [1]
_RCC_AHBENR 0x20000000 0x4 Data Gb main.o [1]
__cmain 0x000000e5 Code Gb cmain.o [4]
__exit 0x00000119 0x14 Code Gb exit.o [5]
__iar_copy_init3 0x0000007d 0x2c Code Gb copy_init3.o [4]
__iar_data_init3 0x000000a9 0x28 Code Gb data_init.o [4]
__iar_debug_exceptions 0x00000116 0x1 Data Gb unwind_debug.o [5]
__iar_program_start 0x0000012d Code Gb cstartup_M.o [4]
__iar_systems$$module {Abs}
0x00000001 Data Gb command line/config [2]
__low_level_init 0x00000103 0x4 Code Gb low_level_init.o [3]
__vector_table 0x00000000 Data Gb vector_table_M.o [4]
_call_main 0x000000f1 Code Gb cmain.o [4]
_exit 0x0000010d Code Gb cexit.o [4]
_main 0x000000ff Code Gb cmain.o [4]
exit 0x00000107 0x4 Code Gb exit.o [3]
main 0x0000005f 0x12 Code Gb main.o [1]
[1] = C:\Users\jjw\Desktop\hobby\Test\Debug\Obj
[2] = command line
[3] = dl7M_tln.a
[4] = rt7M_tl.a
[5] = shb_l.a
291 bytes of readonly code memory
33 bytes of readonly data memory
1 036 bytes of readwrite data memory
Errors: none
Warnings: none
The content of icf file is as below.
/*###ICF### Section handled by ICF editor, don't touch! ****/
/*-Editor annotation file-*/
/* IcfEditorFile="$TOOLKIT_DIR$\config\ide\IcfEditor\cortex_v1_4.xml" */
/*-Specials-*/
define symbol __ICFEDIT_intvec_start__ = 0x00000000;
/*-Memory Regions-*/
define symbol __ICFEDIT_region_IROM1_start__ = 0x00000000;
define symbol __ICFEDIT_region_IROM1_end__ = 0x0007FFFF;
define symbol __ICFEDIT_region_IROM2_start__ = 0x0;
define symbol __ICFEDIT_region_IROM2_end__ = 0x0;
define symbol __ICFEDIT_region_EROM1_start__ = 0x0;
define symbol __ICFEDIT_region_EROM1_end__ = 0x0;
define symbol __ICFEDIT_region_EROM2_start__ = 0x0;
define symbol __ICFEDIT_region_EROM2_end__ = 0x0;
define symbol __ICFEDIT_region_EROM3_start__ = 0x0;
define symbol __ICFEDIT_region_EROM3_end__ = 0x0;
define symbol __ICFEDIT_region_IRAM1_start__ = 0x20000000;
define symbol __ICFEDIT_region_IRAM1_end__ = 0x2000FFFF;
define symbol __ICFEDIT_region_IRAM2_start__ = 0x0;
define symbol __ICFEDIT_region_IRAM2_end__ = 0x0;
define symbol __ICFEDIT_region_ERAM1_start__ = 0x0;
define symbol __ICFEDIT_region_ERAM1_end__ = 0x0;
define symbol __ICFEDIT_region_ERAM2_start__ = 0x0;
define symbol __ICFEDIT_region_ERAM2_end__ = 0x0;
define symbol __ICFEDIT_region_ERAM3_start__ = 0x0;
define symbol __ICFEDIT_region_ERAM3_end__ = 0x0;
/*-Sizes-*/
define symbol __ICFEDIT_size_cstack__ = 0x400;
define symbol __ICFEDIT_size_proc_stack__ = 0x0;
define symbol __ICFEDIT_size_heap__ = 0x800;
/**** End of ICF editor section. ###ICF###*/
define memory mem with size = 4G;
define symbol use_IROM1 = (__ICFEDIT_region_IROM1_start__ != 0x0 || __ICFEDIT_region_IROM1_end__ != 0x0);
define symbol use_IROM2 = (__ICFEDIT_region_IROM2_start__ != 0x0 || __ICFEDIT_region_IROM2_end__ != 0x0);
define symbol use_EROM1 = (__ICFEDIT_region_EROM1_start__ != 0x0 || __ICFEDIT_region_EROM1_end__ != 0x0);
define symbol use_EROM2 = (__ICFEDIT_region_EROM2_start__ != 0x0 || __ICFEDIT_region_EROM2_end__ != 0x0);
define symbol use_EROM3 = (__ICFEDIT_region_EROM3_start__ != 0x0 || __ICFEDIT_region_EROM3_end__ != 0x0);
define symbol use_IRAM1 = (__ICFEDIT_region_IRAM1_start__ != 0x0 || __ICFEDIT_region_IRAM1_end__ != 0x0);
define symbol use_IRAM2 = (__ICFEDIT_region_IRAM2_start__ != 0x0 || __ICFEDIT_region_IRAM2_end__ != 0x0);
define symbol use_ERAM1 = (__ICFEDIT_region_ERAM1_start__ != 0x0 || __ICFEDIT_region_ERAM1_end__ != 0x0);
define symbol use_ERAM2 = (__ICFEDIT_region_ERAM2_start__ != 0x0 || __ICFEDIT_region_ERAM2_end__ != 0x0);
define symbol use_ERAM3 = (__ICFEDIT_region_ERAM3_start__ != 0x0 || __ICFEDIT_region_ERAM3_end__ != 0x0);
if (use_IROM1)
{
define region IROM1_region = mem:[from __ICFEDIT_region_IROM1_start__ to __ICFEDIT_region_IROM1_end__];
}
else
{
define region IROM1_region = [];
}
if (use_IROM2)
{
define region IROM2_region = mem:[from __ICFEDIT_region_IROM2_start__ to __ICFEDIT_region_IROM2_end__];
}
else
{
define region IROM2_region = [];
}
define region IROM_region = IROM1_region | IROM2_region;
if (use_EROM1)
{
define region EROM1_region = mem:[from __ICFEDIT_region_EROM1_start__ to __ICFEDIT_region_EROM1_end__];
}
else
{
define region EROM1_region = [];
}
if (use_EROM2)
{
define region EROM2_region = mem:[from __ICFEDIT_region_EROM2_start__ to __ICFEDIT_region_EROM2_end__];
}
else
{
define region EROM2_region = [];
}
if (use_EROM3)
{
define region EROM3_region = mem:[from __ICFEDIT_region_EROM3_start__ to __ICFEDIT_region_EROM3_end__];
}
else
{
define region EROM3_region = [];
}
define region EROM_region = EROM1_region | EROM2_region | EROM3_region;
if (use_IRAM1)
{
define region IRAM1_region = mem:[from __ICFEDIT_region_IRAM1_start__ to __ICFEDIT_region_IRAM1_end__];
}
else
{
define region IRAM1_region = [];
}
if (use_IRAM2)
{
define region IRAM2_region = mem:[from __ICFEDIT_region_IRAM2_start__ to __ICFEDIT_region_IRAM2_end__];
}
else
{
define region IRAM2_region = [];
}
define region IRAM_region = IRAM1_region | IRAM2_region;
if (use_ERAM1)
{
define region ERAM1_region = mem:[from __ICFEDIT_region_ERAM1_start__ to __ICFEDIT_region_ERAM1_end__];
}
else
{
define region ERAM1_region = [];
}
if (use_ERAM2)
{
define region ERAM2_region = mem:[from __ICFEDIT_region_ERAM2_start__ to __ICFEDIT_region_ERAM2_end__];
}
else
{
define region ERAM2_region = [];
}
if (use_ERAM3)
{
define region ERAM3_region = mem:[from __ICFEDIT_region_ERAM3_start__ to __ICFEDIT_region_ERAM3_end__];
}
else
{
define region ERAM3_region = [];
}
define region ERAM_region = ERAM1_region | ERAM2_region | ERAM3_region;
do not initialize { section .noinit };
initialize by copy { readwrite };
if (isdefinedsymbol(__USE_DLIB_PERTHREAD))
{
// Required in a multi-threaded application
initialize by copy with packing = none { section __DLIB_PERTHREAD };
}
place at address mem:__ICFEDIT_intvec_start__ { readonly section .intvec };
if (!isempty(IROM_region))
{
place in IROM_region { readonly };
}
if (!isempty(EROM_region))
{
place in EROM_region { readonly section application_specific_ro };
}
if (!isempty(IRAM_region))
{
define block CSTACK with alignment = 8, size = __ICFEDIT_size_cstack__ { };
define block PROC_STACK with alignment = 8, size = __ICFEDIT_size_proc_stack__ { };
define block HEAP with alignment = 8, size = __ICFEDIT_size_heap__ { };
place in IRAM_region { readwrite, block CSTACK, block PROC_STACK, block HEAP };
}
if (!isempty(ERAM_region))
{
place in ERAM_region { readwrite section application_specific_rw };
}
In EWARM tool, I uploaded the above source code using JLink debugger. The connection that JLink debugger and my board is as picture.
I also tried to upload bin file that generated by EWARM tool with manually(Don't used EWARM function) as below.
The below method is same method that uploaded bin file by generated by LLVM. at result, EWARM bin file is the LED turn on but llvm file is not.
I checked the register value different when upload EWARM bin file and when upload LLVM bin file. (PC, SP, MSP)
The below is register value at starting point after upload EWARM bin file. (This is operated)
The below is register value at starting point after upload LLVM bin file. (This is not operated)
I think the cause of this problem would be that the value of program counter, (main) stack pointer is incorrect. If this inference is right, How to configure the register value of the first time?
If the additional information is needed, please tell me. I want to solve this problem earnestly.
So you are on the right path other than a couple things. You are using bsrr to reset then set then immediately reset the output pin. First off, to turn on the led, does your board design need the pin low or high? If low then your main.c code is fine if high then it should blink it so fast that you need a scope or something like that your eyes will not see it.
I have many stm32 boards with many different chips. I do not have this one or one from this family, but that is fine, going to walk through some things to look for, show how you can completely control all of the code, then you can work backward into your tools and examine the output and see if the problem is the binary or how you are loading it into the part. One would assume that if you can build one way and load with the same tool/command and it "works" but build a different way and it does not work then it is not the loading of the binary but the build/software.
I am using a NUCLEO-F446RE board. There is an led on PA5. You have gnu tools, I have gnu tools so you will be able to use those tools to build this project (and modify to your needs if you choose to do so).
flash.ld
flash.s
main.c
build
You do not necessarily have to use all of those command line options, experiment (but examine the output).
Before using the binary examine it
The first part is the vector table it needs to be at 0x08000000
I used objdump to generate this so it is going to try to disassemble these bytes no matter what. So when you see the above what matters is this
The first item is the stack pointer init value, you likely have much more memory and it is not uncommon to simply set the stack pointer to the max address plus one or 0x20000000 + the amount of ram. This tiny example barely uses the stack and the application is quite tiny so 0x1000 bytes is much more than enough.
The next so many are the vectors themselves and they need to be the handler address ORRED with 1
If you do not see that then the thing will not boot and it is game over already, do not try to use the binary until the vector table is linked for the right address and contains at a minimum the first two words stack pointer init and reset handler.
I included many other vectors to trap faults, if your code is bug free and built right then you do not need them for something like this.
The orr.w instruction indicates this is built for thumb2, armv7-m. And that is fine for both my board (cortex-m4) and your board (cortex-m3) but if this were a cortex-m0 or cortex-m0+ this code would fail and cause a fault wanting a fault handler even if it is an infinite loop (rather than the vector entry being instructions that further upset the core and possibly make it worse to try to debug with a debugger). The unfortunate side effect of how arm did things including the unified syntax is that you cannot tell from the assembly language exactly what you are going to get, well, with practice, but the best way to view it is disassembled.
So there is a chance this code will work. This nucleo board is mbed style so it presents itself as a removable drive and you simply copy the .bin file over.
The PUT32/GET32 is based on experience, an abstraction layer has many benefits, period. You can use the volatile pointer thing and I will show that shortly.
Likewise it is best to read-modify-write these registers as a habit, this part and these registers are well documented and this is post reset code without other code in front of it (rtos, libraries, etc) so it is safe to assume that you can simply jam the value into the registers (not that the clock enable register resets to 0x00008000 for your part and you are disabling GPIOG, why is it enabled? Who knows)
0x00000020 vs (1<<5) is personal preference, I use either myself depending on the code and situation, in this case my preference is to clearly see the pin number.
This is a simple delay that does not require volatile, the compiler can't optimize outside the file in this case so must implement the loop. The value was hand tuned, do not expect this to generate a reliable rate of any kind, just make it bit enough to see the led blink not too fast not too slow. Once you see it work then change the value double it, half it, re-build, re-load and see the led blink rate change that is a rough test to see that the blinker code is the code you just generated and not something left over from you or someone else, do not want to end up with bad assumptions that some code you made worked when instead the tools failed you and they did not load the new program into flash.
Volatile pointer approach, and this may be related to your issue.
What is this all about:
I can't find the statement in my document, but the issue is that by just jamming the value in to both of these registers there is a small number of clocks between the time the peripheral is enabled and we start trying to write to it. The read-modify-write approach, in particular using the abstraction functions provided plenty of delay. So in this case I experimentally added a dummy call to burn some time. And this was adequate on my chip.
Using a volatile read-modify-write was also adequate.
In researching this on an other STM32 part, for whatever reason perhaps this reason, you can read the moder register or perhaps the reset value of the moder register ahead of the clock enable, without enabling the peripheral at all, so the read happens through that solution then the modify write burns some number of clocks between the processor and the bus, giving the delay needed to allow the write to work. You may have this problem with your code and the two compilers may be generating the code differently. I know from research that llvm/clang and gnu have a different opinion on what volatile means. We can see that in a minute.
I intentionally did this build so that main.s is generated for the gnu case even though that is an unnecessary step.
Here is the race condition:
The compiler prepped the two stores in front and did them back to back, there are clocks related to the ahb bus, but not enough apparently.
I had not seen the web page thing you are using, I "simply" (it takes an eternity even on a fast computer) build a cross compiler for llvm/clang for these targets (these days that's the only way I can get it to work correctly, apt-gotten with triples are not working for version 10 or 11 whatever it is I tried last). I also roll my own gnu tools from sources, but whatever.
llvm
So with llvm
can be back to back with no delay, not because of volatile necessarily but how the compiler chose to arrange the instructions and what instructions it chose to use.
Also understand this is gcc version 10.2.0, there is no reason to assume that prior/different versions produce the same code. Nor any reason whatsoever to assume that IAR if it doesn't use gnu or other toolchains would generate the same code. You need to examine the disassembly, understand where issues can arise, etc. You can easily see that someone who doesn't like my PUT32/GET32 read-modify-write and simply changes those few lines of code into volatile pointer can cause the program to break. With experience one should see the difference in the high level code as causing a possible race condition because the execution speed of those register modifications has changed, and timing does matter. Order certainly matters in a case like this so re-arranging them will fail, but also timing, trying to make your code faster, removing a printf that was there for debug and then everything breaks, first thought is did I change the code to something functionally equivalent, if that is true then next thought is timing, add lots of delays then start removing them.
You can now easily repeat all of this using my flash.ld and flash.s and your main.c turned into main.s, or take my main.c, one of them, and replace the three registers with the addresses from your datasheet.
So we might assume that since you are ideally only changing main.c/main.s then the vector table is not the problem, the binary is otherwise okay.
At least make the moder register a read-modify-write, or put a delay in to see if you are seeing a race condition as well.
The bsrr value indicates your led is on pin5 (of port E), which is bit 10 being set in moder you have bits 10 and 11 being set with 0x600, was there a reason for that, will not hurt trying to get the led on.
And then essentially you have
followed very quickly by
and then you go into an infinite loop which no longer changes anything PE5 should be low forever or until you reset and then it gets a blip a handful/dozen clocks long.
You probably do not have the race condition in the llvm web page code:
There exists the possibility that it is tools still.
which for me is with your generated assembly language
so the initial problem was right there on your screen.
The vector table looks like this:
and that is simply not going to work it might even try to fetch at that address, but it is immediately game over.
So the extremely short answer to this is that you didnt provide a vector table nor a bootstrap.
Now understand in my case this is my bootstrap:
Normally for an mcu you want to copy .data from flash to ram and zero .bss, you need a much more complicated linker script to identify these areas and the linker script and bootstrap code are intimately related (and are toolchain specific, not assumed to port to other toolchains). I do not use .data nor use nor care about .bss items being zero, so my linker script is as trivial as it is and my bootstrap is setup the stack pointer and enter C code, since the cortex-m takes care of the stack pointer all I have to do is call the C entry point. Because of how the cortex-m works you can actually do this:
flash.s
But that only works if you do not rely on .data nor .bss or god forbid you think it is okay to initialize those in C code and not in a bootstrap (written in asm of course).
The right answer for generic C support is to borrow/modify/create a complicated linker script that you can use to get the tools to help you create variables that identify beginning and end or beginning and size of .data (both in flash and ram) and .bss (in ram) and copy and zero, possibly go so far as to generate an int argc (of 1) and argv[0] at a minimum just in case the user feels the need.
C library implementations often include even more linker script stuff although it isn't necessary it is just how some folks tend to do it, and likewise more bootstrap stuff which, sure, that is the right place for some of these things. I do not use C libraries if I can avoid it it makes the project instantly much larger and so many of them need a fake system you have to then implement the fake system to get them to work.
So clearly my very trivial example is just that, it has severe limitations on what you can do with it but it demonstrates success, isolates you completely from any library code that can be interfering with success (by trying to do things that bypass the library code or the library code and its bootstrap items perhaps, are interfering with your success accessing registers directly).
Also note in my implementation I rely on the command line to get the vector table up front, a lot of folks will:
and then something like
note that
Is all kinds of broken:
And will not boot. Always check the vector table on a cortex-m build before attempting to program the part. Not in your case and not in my case but there are some/many solutions where the ability to re-program the part relies heavily on the binary on the part having all of the loader code in there and nothing hangs or is broken, there is a laundry list of boards like this I will not mention any by name.
Many that work with the Arduino environment would fall into this case and first off if you roll your own blinker like this, that will break your ability to load the part through the sandbox again. But if you were aiming to build all of their code in and this happened you would still be bricked (can still get into the stm32 parts with boot0 and serial or usb, etc or swd, some vendors parts you can easily brick and not be able to recover with swd). (The jlink thing you are using is using swd (Serial wire debug) to get into the part and program the flash).