I've got a set of paired items: [(a0,b0),(a1,b1),(a2,b2),...(aN,bN)]
. These pairs form a set of runs. For instance, it may be the case that b4=a1
and b1=a6
, so 4,1,6
is a run. All a
s, and all b
s, are unique, so runs are never ambiguous. Note that some runs will be of length 1.
I'd like to permute this set such that runs are consecutive. In the permuted set, for any j
, it should be the case either that aj=b(j+1)
, or that aj
is not equal to any b
. I don't care about any other aspect of the permutation, and the runs can appear in any order relative to each other.
For instance: the set [(5,2),(0,3),(1,0),(7,4),(6,5),(3,8)]
contains the runs 4,0
((6,5)
and (5,2)
), 2,1,5
((1,0)
,(0,3)
, and (3,8)
), and 3
. So a valid permutation would be [(6,5),(5,2),(1,0),(0,3),(3,8),(7,4)]
.
Clearly there's a straightforward O(n log n)
time and O(n)
space solution:
- Sort the array of pairs by
b
. - For each pair's
a
, perform a binary search over the array for a matchingb
. Record alla
s which do not match anyb
; these will be the run beginnings. - Sort the array of pairs by
a
. - For each run-beginning
a
, walk over that run, first by binary searching for the pair matching thea
, then by binary searching for that pair'sb
among thea
s, and so on, until the run ends.
Asymptotically this is probably as good as we can do without hashing. From a practical standpoint, though, I'm not entirely happy. It involves two different sorts and 2*N
binary searches. It's pretty darn branchy. This code is in an extreme inner loop (yes I've profiled) and will have a major impact on the performance of the system as a whole.
Do any alternative approaches spring to mind?
We can do away with the binary searches at least. Decorate each pair with a pointer to a linked list node (letters in this example) and sort by second number and separately by first number.
Now do a modified sorted merge.
The linked list structure looks like
which we can staple together.