I have a PCB with an STM32H723VG and an STM32G473RC MCU. Both MCUs exchange data over SPI (20 MHz Clock, short bus). The H7 is the master, the G4 is the slave. The communication is initiated by the slave, sending some information over an (additional) UART connection, e.g. both MCUs check the sizes of the data buffer. When both agree that it would be OK to communicate, the master starts to transmit.
The data are streamed as a fixed size buffer (~100 bytes), with a frame rate of 5 kHz. The H7 uses Low Level drivers, on the G4 I use the HAL drivers. The SPI on the G4 is in circular mode with double buffer technique and I use the half transfer complete and transfer complete interrupts on the G4 to get an interrupt on each new frame.
This all works very well and is "long term stable" (if you might consider a 30 min. test on my desktop as "long term" ...). As the communication between these two MCUs is critical for my project, I have some algorithms, to check if the data are consistent for each frame. If the checks fail, the stream is stopped and reinitialized after 100 ms, using the same functions as during the first start at power up.
And here things become weird: the same functionality that manages to start up the communication as intended and fully functional does not lead to the same results on the re-initialisation. On the second and all consecutive starts of the communication the SPI Rx Buffer on the Master (H7) is shifted by 3 bytes. When I press the reset button on the G4 the communication start up again without errors.
For testing purposes, I made both buffers contain only 0x00, and on each transmission, the first byte of the buffer is incremented. On the first start, I can clearly see the first bytes of the Rx- and TX buffer - both on the H7 and G4 - counting up. After the restart I can see the 3rd byte in the Rx-buffer on the H7 counting up, while the Rx Buffer on the G4 is in the correct shape.
The question is: what could cause that bit shift?
On the Slave Side (G4) I use
HAL_SPI_Abort(&hspi1); // Clear SPI
MX_SPI1_Init(); // (re-)init SPI
to shutdown the SPI when the Stream is interrupted plus
HAL_SPI_TransmitReceive_DMA(&hspi1, pTx, pRx, bufSz);
to (re-)start it afterwards. Needless to say, that this is the exact sequence at the first start.The only difference is: SPI_Abort() and SPI1_Init() have no effect on the first run, because the SPI is in it's initial state after a hardware reset.
Second question: Who causes the issue? The H7 or the G4? is the G4 late on shifting out the data or is the H7 late on reading the data? I have checked the SPIx->MADR Register on the G4, the pointer points at the start of the Tx Buffer. This seems to be correct. On the H7 the address is set to the correct value by the LL drivers on every frame, so there can't be a problem, too. This means the memory addresses on both sides are correct, but for some reason the buffer is shifted anyway.
The only thing I could imagine so far is, that the CleanUp of the TX DMA on the G4 (using SPI_Abort() and SPI1_Init()) is not that clean as it is supposed to be. How could I check that? Making a snapshot of all registers and compare them?
I found a solution.
First of all I could clearly see in the Scope, that the data on the MISO line were late by three bytes. The reason for this is as follows:
My SPI is configured to 8 bit data size, the FIFO of the SPI of the G4 is 32 bit. This means when the Transmission is stopped, there are 3 byte remaining in the FIFO as the DMA – in circular mode - is stopped together with the SPI. After the stop there is no clock that could empty the FIFO.
These 3 bytes were all 0x00 in my case, because I filled up the Tx buffer on the Slave with 0x00 to match the size of the Tx buffer of the Master.
When the SPI is restarted, these 3 remaining bytes are "used first", which leads to a reproducible shift of 3 byte in the RX buffer of the master.
It's not a bug, it's a feature! RM0440 Rev. 7 Figure 583 (page 1748) might help understanding this.
The solution is uggly:
It is ugly, because there is no method to access the FIFO and clear the fifo level by software. In the manual [1] the FTLVL register is described as >>These bits are set and cleared by hardware.<< and as in the stm32g473xx.h there is not even a symbol #define(d) for these two bits in the SR (Status Register) I did not even try to write them by software.
If anyone knows a more sophisticated solution to clear the TX FIFO of the SPI, please let me know, I'd appreciate it!
[1] RM0440 Rev. 7, page 1780, Chapter 39.9.3 SPI status register (SPIx_SR)