This should be easy enough to test - you'd expect the "wrong" alignment to take precisely twice as long as the other alingnments (and with your test program running from flash and the default stack in scratch X, the CPU shouldn't touch the main RAM so a simple program should give accurate answers).
There's only four alignments to test: the striping pattern repeats at intervals of 16 (or to put it another way, it's only the last hex digit of the addresses that matters), and you are obviously doing 32-bit transfers if you care about performance so the addresses are word aligned. Hence the four cases are offsets of 16n, 16n+4, 16n+8, 16n+12.
My reading of the pipleilne description above is that the reads and writes occur 2 cycles apart, so the 'bad' case is 16n+8.
If I'm right, then this is quite convenient: just allocate all of your buffers aligned to multiple of 16 and you are guaranteed to avoid the clash.
There's only four alignments to test: the striping pattern repeats at intervals of 16 (or to put it another way, it's only the last hex digit of the addresses that matters), and you are obviously doing 32-bit transfers if you care about performance so the addresses are word aligned. Hence the four cases are offsets of 16n, 16n+4, 16n+8, 16n+12.
My reading of the pipleilne description above is that the reads and writes occur 2 cycles apart, so the 'bad' case is 16n+8.
If I'm right, then this is quite convenient: just allocate all of your buffers aligned to multiple of 16 and you are guaranteed to avoid the clash.
Statistics: Posted by arg001 — Tue Feb 20, 2024 10:11 am