I'm working on an embedded FPGA-CPU system (Xilinx Ultrascale+ Zynq Board) with a cache-coherent CPU and an optionally coherent FPGA. The FPGA uses the AXI4 protocol, with the additional ability to use AXI4 ACE to implement coherence, if desired.
Having read through the ACE protocol, it is well defined when the AXI manager emits a single cache-line transaction, but it's not clear to me what happens if the manager emits a transaction that is longer than one cache line. Does the interconnect generate a series of snoops for the entire transaction? Or are large transactions that indicate participation in cache coherence illegal in AXI4 ACE?
If large transactions are illegal, is there any better than naive method for flushing and invalidating a large data segment from all caches in the system?
After an even more thorough read, the specification (pgs. D3-182,183) allows for coherence transactions for a limited subset of burst lengths: 1, 2, 4, 8, 16 - instead of the maximum AXI4 burst length of 256. The snoop channels (AC, CR, CD) do operate at the single cache-line granularity so the AXI interconnect is responsible for splitting cache-coherent transactions on the AR/AW channels into the corresponding single-line snoops.