Learning Clojure: recursion for Hidden Markov Model

332 views Asked by At

I'm learning Clojure and started by copying the functionality of a Python program that would create genomic sequences by following an (extremely simple) Hidden Markov model.

In the beginning I stuck with my known way of serial programming and used the def keyword a lot, thus solving the problem with tons of side effects, kicking almost every concept of Clojure right in the butt. (although it worked as supposed)

Then I tried to convert it to a more functional way, using loop, recur, atom and so on. Now when I run I get an ArityException, but I can't read the error message in a way that shows me even which function throws it.

(defn create-model [name pA pC pG pT pSwitch]
; converts propabilities to cumulative prop's and returns a char
  (with-meta
    (fn []
      (let [x (rand)]
        (if (<= x pA)
          \A
          (if (<= x (+ pA pC))
            \C
            (if (<= x (+ pA pC pG))
              \G
              \T)))))                   ; the function object
    {:p-switch pSwitch :name name}))    ; the metadata, used to change model


(defn create-genome [n]
; adds random chars from a model to a string and switches models randomly
  (let [models [(create-model "std" 0.25 0.25 0.25 0.25 0.02) ; standard model, equal props
                (create-model "cpg" 0.1 0.4 0.4 0.1 0.04)]    ; cpg model
        islands (atom 0)                 ; island counter
        m (atom 0)]                      ; model index
    (loop [result (str)]
      (let [model (nth models @m)]
        (when (< (rand) (:p-switch (meta model))) ; random says "switch model!"
;          (swap! m #(mod (inc @m) 2))   ; swap model used in next iteration     
          (swap! m #(mod (inc %) 2))     ; EDIT: correction
          (if (= @m 1)                   ; on switch to model 1, increase island counter
;            (swap! islands #(inc @islands)))) ; EDIT: my try, with error
            (swap! islands inc)))) ;             EDIT: correction
        (if (< (count result) n)         ; repeat until result reaches length n
          (recur (format "%s%c" result (model)))
          result)))))

Running it works, but calling (create-genome 1000) leads to

ArityException Wrong number of args (1) passed to: user/create-genome/fn--772  clojure.lang.AFn.throwArity (AFn.java:429)

My questions:

  • (obviously) what am I doing wrong?
  • how exactly do I have to understand the error message?

Information I'd be glad to receive

  • how can the code be improved (in a way a clojure-newb can understand)? Also different paradigms - I'm grateful for suggestions.
  • Why do I need to put pound-signs # before the forms I use in changing the atoms' states? I saw this in an example, the function wouldn't evaluate without it, but I don't understand :)
3

There are 3 answers

1
soulcheck On BEST ANSWER

Ok, it's a long shot, but it looks like your atom-updating functions:

#(mod (inc @m) 2)

and

#(inc @islands)

are of 0-arity, and they should be of arity at least 1.

This leads to the answer to your last question: the #(body) form is a shortcut for (fn [...] (body)). So it creates an anonymous function. Now the trick is that if body contains % or %x where x is a number, the position where it appears will be substituted for the referece to the created function's argument number x (or the first argument if it's only %).

In your case that body doesn't contain references to the function arguments, so #() creates an anonymous function that takes no arguments, which is not what swap! expects.

So swap tries to pass an argument to something that doesn't expect it and boom!, you get an ArityException.

What you really needed in those cases was:

(swap! m #(mod (inc %) 2)) ; will swap value of m to (mod (inc current-value-of-m) 2) internally

and

(swap! islands inc) ; will swap value of islands to (inc current-value-of-islands) internally

respectively

1
Dominykas Mostauskis On

Your mistake has to do with what you asked about the hashtag macro #.

#(+ %1 %2) is shorthand for (fn [x y] (+ x y)). It can be without arguments too: #(+ 1 1). That's how you are using it. The error you are getting is because swap! needs a function that accepts a parameter. What it does is pass the atom's current value to your function. If you don't care about its state, use reset!: (reset! an-atom (+ 1 1)). That will fix your error.

Correction:

I just took a second look at your code and realised that you are actually using working on the atoms' states. So what you want to do is this:

(swap! m #(mod (inc %) 2)) instead of (swap! m #(mod (inc @m) 2)).

As far as style goes, you are doing good. I write my functions differently every day of the week, so maybe I'm not one to give advice on that.

3
Magos On

Since you asked for ways to improve, here's one approach I often find myself going to : Can I abstract this loop into a higher order pattern?

In this case, your loop is picking characters at random - this can be modelled as calling a fn of no arguments that returns a character - and then accumulating them together until it has enough of them. This fits very naturally into repeatedly, which takes functions like that and makes lazy sequences of their results to whatever length you want.

Then, because you have the entire sequence of characters all together, you can join them into a string a little more efficiently than repeated formats - clojure.string/join should fit nicely, or you could apply str over it.

Here's my attempt at such a code shape - I tried to also make it fairly data-driven and that may have resulted in it being a bit arcane, so bear with me:

(defn make-generator 
  "Takes a probability distribution, in the form of a map 
  from values to the desired likelihood of that value appearing in the output.
  Normalizes the probabilities and returns a nullary producer fn with that distribution."
  [p-distribution]  
  (let[sum-probs (reduce + (vals p-distribution))
       normalized (reduce #(update-in %1 [%2] / sum-probs) p-distribution (keys p-distribution) )]
      (fn [] (reduce 
              #(if (< %1 (val %2)) (reduced (key %2)) (- %1 (val %2))) 
              (rand) 
              normalized))))

(defn markov-chain 
  "Takes a series of states, returns a producer fn.
  Each call, the process changes to the next state in the series with probability :p-switch,
  and produces a value from the :producer of the current state."
  [states]
  (let[cur-state (atom (first states))
       next-states (atom (cycle states))]
    (fn [] 
      (when (< (rand) (:p-switch @cur-state))
        (reset! cur-state (first @next-states))
        (swap! next-states rest))
      ((:producer @cur-state)))))


(def my-states [{:p-switch 0.02 :producer (make-generator {\A 1 \C 1 \G 1 \T 1})  :name "std"}
                {:p-switch 0.04 :producer (make-generator {\A 1 \C 4 \G 4 \T 1})  :name "cpg"}])


(defn create-genome [n]
  (->> my-states
       markov-chain
       (repeatedly n)
       clojure.string/join))

To hopefully explain a little of the complexity:

  • The let in make-generator is just making sure the probabilities sum to 1.
  • make-generator makes heavy use of another higher-order looping pattern, namely reduce. This essentially takes a function of 2 values and threads a collection through it. (reduce + [4 5 2 9]) is like (((4 + 5) + 2) + 9). I chiefly use it to do a similar thing to your nested ifs in create-model, but without naming how many values are in the probability distribution.
  • markov-chain makes two atoms, cur-state to hold the current state and next-states, which holds an infinite sequence (from cycle) of the next states to switch to. This is to work like your m and models, but for arbitrary numbers of states.
  • I then use when to check if the random state switch should occur, and if it does perform the two side effects I need to keep the state atoms up to date. Then I just call the :producer of @cur-state with no arguments and return that.

Now obviously, you don't have to do this exactly this way, but looking for those patterns certainly does tend to help me.
If you want to go even more functional, you could also consider moving to a design where your generators take a state (with seeded random number generator) and return a value plus a new state. This "state monad" approach would make it possible to be fully declarative, which this design isn't.