Closest pair of points (CLRS pg 1043): Running time of splitting a sorted array into two sorted arrays

389 views Asked by At

In finding the closest pair of points in O(nlgn) time, the pseudocode for splitting a sorted list into two sorted lists (CLRS 3rd ed pg 1043) is said to run in O(n) time.

algorithm from CLRS pg 1043

However, this assumes that line 4 runs in constant time, which I find hard to believe (I'd assume it runs in O(lgn) time if it were stored as a binary tree, giving a total running time of O(nlgn).

Y is a sorted array, YL and YR are the two new sub-arrays. PL is a subset of Y in random order, and YL is the same subset, but in sorted order.

Where am I going wrong with my reasoning?

2

There are 2 answers

4
md5 On

I don't know how it is supposed to work in the book, but thinking about the way the algorithm looks like, you can come up with the following idea:

  • Y[i], X[i], YL[i], XL[i], YR[i] and XR[i] are integers, corresponding to the index of the ith-point (so you just have to store some global array which, given the index, returns the x or y coordinate).
  • PL[i] is a boolean, true if the i-th point is in the left part, false otherwise.

At each recursion step, you can compute PL[i] using y coordinates (O(n) time). Then you separate the set of points in two sets "left" and "right" using the algorithm from the book, replacing the line if Y[i] in PL by if PL[Y[i]] (such access is O(1), so in overall we get O(n)).

This has O(n) time complexity and uses O(n) memory.

Thus the closest pair problem is solved that way in T(n) = O(n log n).

1
Navjot Singh On

For simplicity sake we're assuming the list is of integers and not strings or integers which can complicate things greatly here.

There are two calculations to consider here:

  1. for loop: This runs for length of Y times, which I'm assuming is N here
  2. the tricky part - comparison of Y[i] with PL(Note: the comparison of two numbers is constant if we consider them to be of word size). Now, accessing Y[i] is constant since we're dealing with Random Access Machines. However, to compare it with an array PL of length, say, k will take k time. If this k is very small and independent of the size of input array Y, this ideally would be constant.

To write it with greater precision would mean you consider the time taken for k comparisons (length of PL) and hence, the total time of this pseudo code would be O(Nk). But, if the assumptions that k is random and independent of N hold true, it really is O(N)