Clojure commute and alter performance

260 views Asked by At

There is a little bit modified example from clojure.org/refs

(defn mod-nth [v i f] (assoc v i (f (v i))))
(defn run [oper nvecs nitems nthreads niters]
  (let [vec-refs (vec (map (comp ref vec)
                        (partition nitems (repeat (* nvecs nitems) 0))))
        sum  #(reduce + %)
        swap #(let [v1 (rand-int nvecs)
                    v2 (rand-int nvecs)
                    i1 (rand-int nitems)
                    i2 (rand-int nitems)]
                (dosync
                  (let [temp (nth @(vec-refs v1) i1)]
                    (oper (vec-refs v1) mod-nth i1 inc)
                    (oper (vec-refs v2) mod-nth i2 dec))))
        report #(do
                  (prn (map deref vec-refs))
                  (println "Sum:"
                    (reduce + (map (comp sum deref) vec-refs))))]
    (report)
    (dorun (apply pcalls (repeat nthreads #(dotimes [_ niters] (swap)))))
    (report)))

(time (run alter 100 10 10 100000))

sample output is

([0 0 0 0 0 0 0 0 0 0] [...])
Sum: 0
([15 -14 -8 57 -26 -12 -49 -29 33 -3] [...])
Sum: 0
"Elapsed time: 1995.938147 msecs"

Instead of swapping unique numbers i'm transferring ones from one vector elements to another.

This operation could be assumed as commutative so there is another test - it's the same except commute is used instead of alter

(time (run commute 100 10 10 100000))

with sample output like

([0 0 0 0 0 0 0 0 0 0] [...])
Sum: 0
([8 48 -10 -41 -17 -32 -4 50 -31 88] [...])
Sum: 0
"Elapsed time: 3141.591517 msecs"

Surprisingly first example is running in about 2 seconds while second needs 3 seconds

But as mentioned in this SO answer

commute is an optimized version of alter for those times when the order of things really does not matter

How it could be optimized while it needs more time to do the same work in this simple case? What is the purpose of commute ?

2

There are 2 answers

3
juan.facorro On BEST ANSWER

I used VisualVM to monitor the clojure.core functions involved when running the example using both alter and commute.

alter

alter

commute

commute

If my interpretation of the results are correct, the accumulated time spent on each function shows that commute is actually faster than alter. It seems the overhead of all the other operations that need to be done to run the code in parallel are the ones that mess up the performance.

Benchmarking code is quite tricky and using time is sometimes misleading. The information provided by VisualVm might not even be the ultimate word though, profiling and using tools such as criterium might be the best way to make sure results are trustworthy.

Another important fact is that the operations done inside the dosync block don't take that long, so even if one of them retries, the extra time this involves is not that significant. Adding a slight delay inside the dosync makes the difference between retrying (alter) and not retrying (commute) be more noticeable.

(defn mod-nth [v i f] (assoc v i (f (v i))))
(defn run [oper nvecs nitems nthreads niters]
  (let [vec-refs (vec (map (comp ref vec)
                        (partition nitems (repeat (* nvecs nitems) 0))))
        sum  #(reduce + %)
        swap #(let [v1 (rand-int nvecs)
                    v2 (rand-int nvecs)
                    i1 (rand-int nitems)
                    i2 (rand-int nitems)]
               (dosync
                 (let [temp (nth @(vec-refs v1) i1)]
                   (Thread/sleep 1)                     ;; This was added
                   (oper (vec-refs v1) mod-nth i1 inc)
                   (oper (vec-refs v2) mod-nth i2 dec))))
        report #(do
                  (prn (map deref vec-refs))
                  (println "Sum:"
                    (reduce + (map (comp sum deref) vec-refs))))]
    (doall (apply pcalls (repeat nthreads #(dotimes [_ niters] 
                                            (swap)))))))

(time (run alter 100 10 10 5000))
;= "Elapsed time: 15252.427 msecs"
(time (run commute 100 10 10 5000))
;= "Elapsed time: 13595.399 msecs"
0
Charles Duffy On

It's important to understand what the optimization made by commute is specifically: commute avoids unnecessarily rerunning the code inside your block in situations where alter would need to throw away results.

Constant-factor overhead between the implementations of commute and alter is not specified, so what you're seeing here violates no part of the Clojure specification. That said -- as the amount of time spent by individual transactions inside your dosync block grows, the penalty for using alter when you could instead use commute will grow similarly.

In general:

  • Microbenchmarks are evil (in the sense that they encourage bad practices that don't scale to real-world use). Pay attention to performance behavior in real-world scenarios, not contrived test cases.
  • Use commute in Clojure's STM whenever you can.