Clojure - Sliding Window Minimum in Log Time

403 views Asked by At

Given vector size n and window size k, how can I efficiently calculate the sliding window minimum in n log k time? ie, for vector [1 4 3 2 5 4 2] and window size 2, the output would be [1 3 2 2 4 2].

Obviously I can do it using partition and map but that that's n * k time.

I think I need to keep track of the minimum in a sorted map, and update the map when it's outside the window. But although I can get the min of a sorted map in log time, searching through the map to find any indexes that are expired is not log time.

Thanks.

3

There are 3 answers

3
Thumbnail On BEST ANSWER

You can solve this is with a priority queue based on Clojure's priority map data structure. We index the values in the window with their position in the vector.

  • The value of its first entry is the window minimum.
  • We add the new entry and get rid of the oldest one by key/vector-position.

A possible implementation is

(use [clojure.data.priority-map :only [priority-map]])

(defn windowed-min [k coll]
  (let [numbered (map-indexed list coll)
        [head tail] (split-at k numbered)
        init-win (into (priority-map) head)
        win-seq (reductions
                  (fn [w [i n]]
                    (-> w (dissoc (- i k)) (assoc i n)))
                  init-win
                  tail)]
    (map (comp val first) win-seq)))

For example,

(windowed-min 2 [1 4 3 2 5 4 2])
=> (1 3 2 2 4 2)

The solution is developed lazily, so can be applied to an endless sequence.


After the initialisation, which is O(k), the function computes each element in the sequence in O(log k) time, as noted here.

0
DarrylG On

You can solve in linear time --O(n), rather than O(n*log k)) as described by 1. http://articles.leetcode.com/sliding-window-maximum/ (easily change from find max to find min) and 2. https://people.cs.uct.ac.za/~ksmith/articles/sliding_window_minimum.html

The approaches needs a double ended queue to manage previous values which uses O(1) time for most queue operations (i.e. push/pop/peek, etc.) rather than O(log K) when using Priority Queue (i.e. Priority Map). I used a double ended queue from https://github.com/pjstadig/deque-clojure

Main Code to implement code in 1st reference above (for min rather than max):

(defn windowed-min-queue [w a]
  (let [
        deque-init (fn deque-init [] (reduce (fn [dq i]
                                               (dq-push-back i (prune-back a i dq)))
                                             empty-deque (range w)))

        process-min (fn process-min [dq] (reductions (fn [q i]
                                                       (->> q
                                                            (prune-back a i)
                                                            (prune-front i w)
                                                            (dq-push-back i)))
                                                     dq (range w (count a))))
        init (deque-init)
        result (process-min init)] ;(process-min init)]
    (map #(nth a (dq-front %)) result)))

Comparing the speed of this method to the other solution that uses a Priority Map we have (note: I liked the other solution since as well since its simpler).

; Test using Random arrays of data
(def N 1000000)
(def a (into [] (take N (repeatedly #(rand-int 50)))))
(def b (into [] (take N (repeatedly #(rand-int 50)))))
(def w 1024)
; Solution based upon Priority Map (see other solution which is also great since its simpler)
(time (doall (windowed-min-queue w a)))
;=> "Elapsed time: 1820.526521 msecs"
; Solution based upon  double-ended queue
(time (doall (windowed-min w b)))
;=> "Elapsed time: 8290.671121 msecs"

Which is over a 4x faster, which is great considering the PriorityMap is written in Java while the double-ended queue code is pure Clojure (see https://github.com/pjstadig/deque-clojure)

Including the other wrappers/utilities used on the double-ended queue for reference.

(defn dq-push-front [e dq]
  (conj dq e))

(defn dq-push-back [e dq]
  (proto/inject dq e))

(defn dq-front [dq]
  (first dq))

(defn dq-pop-front [dq]
  (pop dq))

(defn dq-pop-back [dq]
  (proto/eject dq))

(defn deque-empty? [dq]
  (identical? empty-deque dq))

(defn dq-back [dq]
  (proto/last dq))

(defn dq-front [dq]
  (first dq))

(defn prune-back [a i dq]
  (cond
    (deque-empty? dq) dq
    (< (nth a i) (nth a (dq-back dq))) (recur a i (dq-pop-back dq))
    :else dq))

(defn prune-front [i w dq]
  (cond
    (deque-empty? dq) dq
    (<= (dq-front dq) (- i w)) (recur i w (dq-pop-front dq))
    :else dq))
0
Scott Klarenbach On

My solution uses two auxillary maps to achieve fast performance. I map the keys to their values and also store the values to their occurrences in a sorted map. Upon each move of the window, I update the maps, and get the minimum of the sorted map, all in log time.

The downside is the code is a lot uglier, not lazy, and not idiomatic. The upside is that it outperforms the priority-map solution by about 2x. I think a lot of that though, can be blamed on the laziness of the solution above.

(defn- init-aux-maps [w v]
  (let [sv (subvec v 0 w)
        km (->> sv (map-indexed vector) (into (sorted-map)))
        vm (->> sv frequencies (into (sorted-map)))]
    [km vm]))

(defn- update-aux-maps [[km vm] j x]
  (let [[ai av] (first km)
        km (-> km (dissoc ai) (assoc j x))
        vm (if (= (vm av) 1) (dissoc vm av) (update vm av dec))
        vm (if (nil? (get vm x)) (assoc vm x 1) (update vm x inc))]
    [km vm]))

(defn- get-minimum [[_ vm]] (ffirst vm))

(defn sliding-minimum [w v]
  (loop [i 0, j w, am (init-aux-maps w v), acc []]
    (let [acc (conj acc (get-minimum am))]
      (if (< j (count v))
        (recur (inc i) (inc j) (update-aux-maps am j (v j)) acc)
        acc))))