Amortized worst case complexity of binary search

1.3k views Asked by At

For a binary search of a sorted array of 2^n-1 elements in which the element we are looking for appears, what is the amortized worst-case time complexity?

Found this on my review sheet for my final exam. I can't even figure out why we would want amortized time complexity for binary search because its worst case is O(log n). According to my notes, the amortized cost calculates the upper-bound of an algorithm and then divides it by the number of items, so wouldn't that be as simple as the worst-case time complexity divided by n, meaning O(log n)/2^n-1?

For reference, here is the binary search I've been using:

public static boolean binarySearch(int x, int[] sorted) {
    int s = 0; //start
    int e = sorted.length-1; //end

    while(s <= e) {
        int mid = s + (e-s)/2;
        if( sorted[mid] == x )
            return true;
        else if( sorted[mid] < x )
            start = mid+1;
        else
            end = mid-1;
    }
    return false;
}
2

There are 2 answers

1
templatetypedef On

I'm honestly not sure what this means - I don't see how amortization interacts with binary search.

Perhaps the question is asking what the average cost of a successful binary search would be. You could imagine binary searching for all n elements of the array and looking at the average cost of such an operation. In that case, there's one element for which the search makes one probe, two for which the search makes two probes, four for which it makes three probes, etc. This averages out to O(log n).

Hope this helps!

0
ganbustein On

iAmortized cost is the total cost over all possible queries divided by the number of possible queries. You will get slightly different results depending on how you count queries that fail to find the item. (Either don't count them at all, or count one for each gap where a missing item could be.)

So for a search of 2^n - 1 items (just as an example to keep the math simple), there is one item you would find on your first probe, 2 items would be found on the second probe, 4 on the third probe, ... 2^(n-1) on the nth probe. There are 2^n "gaps" for missing items (remembering to count both ends as gaps).

With your algorithm, finding an item on probe k costs 2k-1 comparisons. (That's 2 compares for each of the k-1 probes before the kth, plus one where the test for == returns true.) Searching for an item not in the table costs 2n comparisons.

I'll leave it to you to do the math, but I can't leave the topic without expressing how irked I am when I see binary search coded this way. Consider:

public static boolean binarySearch(int x, int[] sorted {
    int s = 0; // start
    int e = sorted.length; // end

    // Loop invariant: if x is at sorted[k] then s <= k < e

    int mid = (s + e)/2;
    while (mid != s) {
        if (sorted[mid] > x) e = mid; else s = mid;
        mid = (s + e)/2; }
    return (mid < e) && (sorted[mid] == x); // mid == e means the array was empty
    }

You don't short-circuit the loop when you hit the item you're looking for, which seems like a defect, but on the other hand you do only one comparison on every item you look at, instead of two comparisons on each item that doesn't match. Since half of all items are found at leaves of the search tree, what seems like a defect turns out to be a major gain. Indeed, the number of elements where short-circuiting the loop is beneficial is only about the square root of the number of elements in the array.

Grind through the arithmetic, computing amortized search cost (counting "cost" as the number of comparisons to sorted[mid], and you'll see that this version is approximately twice as fast. It also has constant cost (within ±1 comparison), depending only on the number of items in the array and not on where or even if the item is found. Not that that's important.