F# PurelyFunctionalDataStructures WeightBiasedLeftistHeap ex 3.4

797 views Asked by At

I'm working on Okasaki's Purely Functional Data Structures and trying to build F# implementations of things. I'm also going through the exercises listed in the book (some are pretty challenging). Well I'm stuck on exercise 3.4 which calls for modifying the merge function of the WeightBiasedLeftistHeap such that it executes in a single pass as opposed to the original 2 pass implementation.

I haven't been able to figure out how to do this yet and was hoping for some suggestions. There was another post here on SO where a guy does it in SML by pretty much inlining the makeT function. I started out going this route (in the commented section 3.4 First Try. But abandoned that approach because I thought that this really wasn't executing in a single pass (it still goes 'till reaching a leaf then unwinds and rebuilds the tree). Am I wrong in interpreting that as still being a two pass merge?

Here is a link to my complete implementation of WeightBiasedLeftistHeap.

Here are my failed attempts to do this in F#:

type Heap<'a> =
| E
| T of int * 'a * Heap<'a> * Heap<'a> 

module WeightBiasedLeftistHeap =
    exception EmptyException

    let weight h =
        match h with
        | E -> 0
        | T(w, _,_,_) -> w

    let makeT x a b =
        let weightA = weight a
        let weightB = weight b
        if weightA >= weightB then
            T(weightA + weightB + 1, x, a, b)
        else
            T(weightA + weightB + 1, x, b, a)

    // excercise 3.4 first try
    //    let rec merge3_4 l r =
    //        match l,r with
    //        | l,E -> l
    //        | E,r -> r
    //        | T(_, lx, la, lb) as lh, (T(_, rx, ra, rb) as rh) ->
    //            if lx <= rx then
    //                let right = merge3_4 lb rh
    //                let weightA = weight la
    //                let weightB = weight right
    //
    //                if weightA >= weightB then
    //                    T(weightA + weightB + 1, lx, la, right)
    //                else
    //                    T(weightA + weightB + 1, lx, right, la)
    //            else
    //                let right = merge3_4 lh rb
    //                let weightA = weight ra
    //                let weightB = weight right
    //
    //                if weightA >= weightB then 
    //                    T(weightA + weightB + 1, rx, ra, right)
    //                else
    //                    T(weightA + weightB + 1, rx, right, ra)

    // excercise 3.4 second try (fail!)
    // this doesn't work, I couldn't figure out how to do this in a single pass
    let merge3_4 l r =
        let rec merge' l r value leftChild  =
            match l,r with
            | l,E -> makeT value leftChild l
            | E,r -> makeT value leftChild r
            | T(_, lx, la, lb) as lh, (T(_, rx, ra, rb) as rh) ->
                if lx <= rx then
                    merge' lb rh lx la   //(fun h -> makeT(lx, la, h))
                else
                    merge' lh rb rx ra   //(fun h -> makeT(rx, ra, h))

        match l, r with
        | l, E -> l
        | E, r -> r
        | T(_, lx, la, lb) as lh, (T(_, rx, ra, rb) as rh) ->
            let lf = fun h -> makeT(lx, la, h)
            if lx <= rx then
                merge' lb rh lx la // (fun h -> makeT(lx, la, h))
            else
                merge' lh rb rx ra // (fun h -> makeT(rx, ra, h))

    let rec merge l r =
        match l,r with
        | l,E -> l
        | E,r -> r
        | T(_, lx, la, lb) as lh, (T(_, rx, ra, rb) as rh) ->
            if lx <= rx then
                makeT lx la (merge lb rh)
            else
                makeT rx ra (merge lh rb)

    let insert3_4 x h =
        merge3_4 (T(1,x,E,E)) h
2

There are 2 answers

1
Chris Okasaki On BEST ANSWER

The first question is: what constitutes a "one-pass" algorithm? Something that could naturally be implemented as a single top-down loop would qualify. In contrast, recursion--compiled naively--normally has two passes, one on the way down and one on the way back up. Tail recursion can easily be compiled into a loop, and usually is in functional languages. Tail recursion modulo cons is a similar, albeit less common, optimization. But, even if your compiler doesn't support tail recursion modulo cons, you can easily convert such an implementation into a loop by hand.

Tail recursion modulo cons is similar to ordinary tail recursion, except that the tail call is wrapped in a constructor, which can be allocated and partially filled in before the recursive call. In this case, you would want the return expressions to be something like T (1+size(a)+size(b)+size(c),x,a,merge(b,c)). The key insight required here (as mentioned in the edit on the other SO thread) is that you don't need to perform the merge to know how big the result it is going to be, and therefore which side of the new tree it should go on. This is because the size of merge(b,c) will always be size(b)+size(c), which can be calculated outside the merge.

Notice that the original rank function for ordinary leftist heaps does not share this property, and so cannot be optimized in this fashion.

Essentially, then, you inline the two calls to makeT and also convert the calls of the form size(merge(b,c)) to size(b)+size(c).

Once you make this change, the resulting function is significantly lazier than the original, because it can return the root of the result before evaluating the recursive merge.

Similarly, in a concurrent environment involving locks and mutation, the new implementation could support significantly more concurrency by acquiring and releasing locks for each node along the way, rather than locking the entire tree. (Of course, this would only make sense for very lightweight locks.)

1
Tomas Petricek On

I'm not exactly sure if I understood the question correctly, but here is my attempt - currently, the merge operation performs a recursive call to merge (that's the first pass) and when it reaches the end of the heap (first two cases in match), it returns the newly constructed heap back to the caller and calls makeT a couple of times (that's the second pass).

I don't think that simply inlining mMakeT is what we're asked to do (if yes, just add inline to makeT and that's done without making code less readable :-)).

What can be done, though, is to modify the merge function to use continuation-passing-style where the "rest of the work" is passed as a function to the recursive call (so there is not pending work on the stack to be done after the first pass completes). This can be done like this:

let rec merge' l r cont =
    match l,r with
    | l,E -> cont l // Return result by calling  the continuation
    | E,r -> cont r // (same here)
    | T(_, lx, la, lb) as lh, (T(_, rx, ra, rb) as rh) ->
        if lx <= rx then
            // Perform recursive call and give it 'makeT' as a continuation
            merge' lb rh (makeT lx la)
        else
            // (same here)
            merge' lh rb (makeT rx ra)

// Using 'id' as a continuation, we just return the 
// resulting heap after it is constructed
let merge l r = merge' l r id

I'm not fully convinced this is the right answer - it performs just a single pass, but the aggregated work (in the continuation) means that the pass is two-times longer. However, I don't see a way to making this simpler, so it may be the right answer...