Seastar's memcached example breaks its shared-nothing design and uses message passing, and is still faster than multi-threaded memcache by a large margin. I want to understand the overhead of various architectures.
pairwise message queues
Seastar has message queues for every pair of CPUs. For n=16 there are more than 200 queues.
(1) Does every CPU iterate all 15 queues to poll the next task?
(2) Which data structure is used for pairwise msg queues? How does this compare to usual mpsc channel implementations? (e.g. rust tokio mpsc, flume, etc)
message passing overhead VS fine-grained locks
If I partition memcache data into 1,000 shards and have a lock for each shard, the probability of lock contention will be very low. The only disadvantage is the memory for 1,000 locks which is small.
(3) Is this still slower than Seastar message passing?
(4) Does uncontended lock still introduce context switches?