I just can't seem to understand how this would work.
Question:
Given a sequential file that contains at most four billion 32 bit integers in random order, find a 32-bit integer that isn't in the file (and there must be at least one missing)
Answer:
it is helpful to view this binary search in terms of the 32 bits that represent each integer. In the first pass of the algorithm we read the (at most) four billion input integers and write those with a leading zero bit to one sequential file and those with a leading one bit to another file.
One of those files contains at most two billion integer, so we next use that file as the current input and repeat the probe process, but this time on the second bit.
So by splitting the file over and over again (binary search) how would this actually lead me to the missing 32 bit integer?
After each pass your next pass will be on the smaller of the two lists you've compiled.
At some point, you MUST encounter an empty list and this will determine your number. For example let's just use 3 bit numbers.
after the first pass we have
Then we look at the 2nd bits in the first list because it is smaller than (or equal to) the second. We would split them into
notice how the file that would start with
01
is empty, this means that there are no numbers that start with01
so010
and011
are missing.The reason we must eventually have a missing list is because we are choosing the smaller list for our next pass each time.