This question is related to cryptography, but I believe I'm asking in right place (not in Crypto Stackexchange).
Kuznyechik block cipher splits a 64-bit word into 16 nibbles (4-bits) and use them as entries in its S-Boxes, each nibble is mixed with 2048-bytes of data in its S-Box set, totaling in 32768-bytes processed per each 64-bit word. There is an example here: https://github.com/veracrypt/VeraCrypt/blob/master/src/Crypto/kuznyechik.c#L2147-L2149
But let's suppose I want to use a 64-bit word directly.
What would be faster (I mean, use less CPU cycles):
Spliting a 64-bit word into 16 nibbles and mixing each of them into a 2048-bytes S-Boxes each (totaling 32768-bytes processed in total 16 nibbles) OR Mixing an entire 64-bit word into S-Boxes of 32768-bytes without any splitting??
/\ Remembering, is the same size of bytes mixed into the two cases.