This is a popular question: What is the most efficient (in time complexity) way to sort 1 million 32-bit integers. Most answers seem to agree that one of the best ways would be to use radix sort since the number of bits in those numbers is assumed to be constant. This is also a very common thought exercise when CS students are first learning non-comparison based sorts. However, what I haven't seen described in detail (or at least clearly) is how to optimally choose the radix (or number of buckets) for the algorithm.
In this observed answer, the selection of the radix/number of buckets was done empirically and it turned out to be 2^8 for 32bits 1million integers. I'm wondering if there is a better way to choose that number? In "Introduction to Algorithms" (p.198-199) it explains Radix's run-time should be Big Theta(d(n+k)) (d=digits/passes, n=number of items, k=possible values). It then goes further and says that given n b-bit numbers, and any positive integer r <= b, radix-sort sorts the number in Big Theta((b/r)(n+2^r)) time. It then says: "If b>= floor(lg(n)), choosing r ~= floor(lg(n)) gives the best time to within a constant factor...".
But, if we choose r=lg(1million)~=20 != 8 as the observed answer suggests.
This tells me that I'm very likely misinterpreting the "choosing of r" approach the book is suggesting and missing something (very likely) or the observed answer didn't choose the optimal value.
Could anyone clear this up for me? Thank you in advance.
The observed answer points to something that seems to want credentials from Google and I'm not keen on "papers, please". However, I think that this is best solved empirically because how long each choice of parameter takes will depend on details of caching and other memory access behaviour. When we work out the time an algorithm will take in theory we don't normally use such a detailed model - we normally just think of the number of operations or number of memory accesses, and we usually even discard constant factors so we can use notations like O(n) vs O(n^2).
If you were doing a lot of similar radix sorts within a long-running program you could have it time a series of test runs before it started up to chose the best setting. This would make sure that it used the fastest setting even if different computers required different settings, because they had different sized caches, or a different ratio of access times between main memory and cache.