I don't know the answer, but you can at least minimize the CPU by using __builtin_bswap16, which, with any luck, will compile to a REV machine instruction. (And maybe you can process chunks at a time with __builtin_bswap128.)
Statistics: Posted by carlk3 — Sun Jun 02, 2024 10:24 pm