How to find length of lazy sequence without forcing realization?

1.4k views Asked by At

I'm currently reading the O'reilly Clojure programming book which it's says the following in it's section about lazy sequences:

It is possible (though very rare) for a lazy sequence to know its length, and therefore return it as the result of count without realizing its contents.

My question is, How this is done and why it's so rare?

Unfortunately, the book does not specify these things in this section. I personally think that it's very useful to know the length of a lazy sequence prior it's realization, for instance, in the same page is an example of a lazy sequence of files that are processed with a function using map. It would be nice to know how many files could be processed before realizing the sequence.

2

There are 2 answers

3
soulcheck On BEST ANSWER

I suppose it's due to the fact that usually there are other ways to find out the size.

The only sequence implementation I can think of now that could potentially do that, is some kind of map of an expensive function/procedure over a known size collection.

A simple implementation would return the size of the underlying collection, while postponing realization of the elements of the lazy-sequence (and therefore execution of the expensive part) until necessary.

In that case one knows the size of the collection that is being mapped over beforehand and can use that instead of the lazy-seq size.

It might be handy sometimes and that's why it's not impossible to implement, but I guess rarely necessary.

0
A. Webb On

As inspired by soulcheck's answer, here is a lazy but counted map of an expensive function over a fixed size collection.

(defn foo [s f] 
  (let [c (count s), res (map f s)] 
    (reify 
      clojure.lang.ISeq 
        (seq [_] res) 
      clojure.lang.Counted 
        (count [_] c) 
      clojure.lang.IPending 
        (isRealized [_] (realized? res)))))


(def bar (foo (range 5) (fn [x] (Thread/sleep 1000) (inc x))))

(time (count bar))
;=> "Elapsed time: 0.016848 msecs"
;    5

(realized? bar)
;=> false


(time (into [] bar))
;=> "Elapsed time: 4996.398302 msecs"
;   [1 2 3 4 5]

(realized? bar)
;=> true

(time (into [] bar))
;=> "Elapsed time: 0.042735 msecs"
;   [1 2 3 4 5]