I have been trying to get the SPI periphery working in low level mode because the HAL library was too slow for my use-case.
Below I submit the working code as I know of no working examples available.
I have been using LL_DMA_STREAM_0 for RX and LL_DMA_STREAM_1 for TX on DMA1 with SPI5.
rx_buffer and tx_buffer are defined as:
#define ADC_BUFFER_SIZE (uint8_t)(24)
static uint8_t tx_buffer[ADC_BUFFER_SIZE] = {0};
static uint8_t rx_buffer[ADC_BUFFER_SIZE] = {0};
I operate directly on registers as the inline functions were significantly slower to execute in debug mode without optimization (about 25-30 % slower).
This is how you need to configure the DMA:
The following function is the callback that runs when the SPI transmission concludes located in stm32h7xx_it.c:
This is the send/receive function:
With this code, I was able to execute a communication window of 24 bytes in roughly 20 microseconds (SPI speed was 15 MHz). The HAL library could only achieve reliable transfers at nearly twice the duration (roughly 24 ksamples in my case).
Hope this helps.