The question is like this: Assume we have N machines, and each machine store and can manipulate its N elements, then, how can we find the median of all the N^2 elements in the lowest cost?
It really bothers me much, hope to get answer from you guys, thanks!
Sorry I just write it down too simple. The elements stored in each machine is random, and have no order. And the cost contains I/O cost, as well as communication between machines, RAM, time everything should be considered too. I just want to find the most efficient way to get the median.
These are some solutions I have come up with:
- use external sort like merge sort or something else, and find the median.
- use bucket sort, divide all the elements into X consecutive buckets according to its value, and so we can decide which bucket the median is in. Scan the bucket and we will get the median.
- I think the finding kth number in O(N) algorithm in "Introduction to Algorithms" should work here?
But still, all these solutions need an extra machine to do the job. I'm wondering whether there is a way that we can only use these N machines to get the median?
Thanks!
In the worst case all the remaining elements in the second iteration will be on a single machine.
Complexity: Pretty sure it is O(nlogn) (i.e. including palatalization it can be O(n^2logn)