Idiomatic exceptions for exiting loops in OCaml

1.2k views Asked by At

In OCaml, imperative-style loops can be exited early by raising exceptions.

While the use of imperative loops is not idiomatic per se in OCaml, I'd like to know what are the most idiomatic ways to emulate imperative loops with early exits (taking into account aspects such as performance, if possible).

For instance, an old OCaml FAQ mentions exception Exit:

Exit: used to jump out of loops or functions.

Is it still current? The standard library simply mentions it as a general-purpose exception:

The Exit exception is not raised by any library function. It is provided for use in your programs.

Relatedly, this answer to another question mentions using a precomputed let exit = Exit exception to avoid allocations inside the loop. Is it still required?

Also, sometimes one wants to exit from the loop with a specific value, such as raise (Leave 42). Is there an idiomatic exception or naming convention to do this? Should I use references in this case (e.g. let res = ref -1 in ... <loop body> ... res := 42; raise Exit)?

Finally, the use of Exit in nested loops prevents some cases where one would like to exit several loops, like break <label> in Java. This would require defining exceptions with different names, or at least using an integer to indicate how many scopes should be exited (e.g. Leave 2 to indicate that 2 levels should be exited). Again, is there an approach/exception naming that is idiomatic here?

4

There are 4 answers

1
antron On BEST ANSWER

As originally posted in comments, the idiomatic way to do early exit in OCaml is using continuations. At the point where you want the early return to go to, you create a continuation, and pass it to the code that might return early. This is more general than labels for loops, since you can exit from just about anything that has access to the continuation.

Also, as posted in comments, note the usage of raise_notrace for exceptions whose trace you never want the runtime to generate.

A "naive" first attempt:

module Continuation :
sig
  (* This is the flaw with this approach: there is no good choice for
     the result type. *)
  type 'a cont = 'a -> unit

  (* with_early_exit f passes a function "k" to f. If f calls k,
     execution resumes as if with_early_exit completed
     immediately. *)
  val with_early_exit : ('a cont -> 'a) -> 'a
end =

struct
  type 'a cont = 'a -> unit

  (* Early return is implemented by throwing an exception. The ref
     cell is used to store the value with which the continuation is
     called - this is a way to avoid having to generate an exception
     type that can store 'a for each 'a this module is used with. The
     integer is supposed to be a unique identifier for distinguishing
     returns to different nested contexts. *)
  type 'a context = 'a option ref * int64
  exception Unwind of int64

  let make_cont ((cell, id) : 'a context) =
    fun result -> cell := Some result; raise_notrace (Unwind id)

  let generate_id =
    let last_id = ref 0L in
    fun () -> last_id := Int64.add !last_id 1L; !last_id

  let with_early_exit f =
    let id = generate_id () in
    let cell = ref None in
    let cont : 'a cont = make_cont (cell, id) in
    try
      f cont
    with Unwind i when i = id ->
      match !cell with
      | Some result -> result
        (* This should never happen... *)
      | None        -> failwith "with_early_exit"
end



let _ =
  let nested_function i k = k 15; i in

  Continuation.with_early_exit (nested_function 42)
  |> string_of_int
  |> print_endline

As you can see, the above implements early exit by hiding an exception. The continuation is actually a partially applied function that knows the unique id of the context for which it was created, and has a reference cell to store the result value while the exception is being thrown to that context. The code above prints 15. You can pass the continuation k as deep as you want. You can also define the function f immediately at the point where it is passed to with_early_exit, giving an effect similar to having a label on a loop. I use this very often.

The problem with the above is the result type of 'a cont, which I arbitrarily set to unit. Actually, a function of type 'a cont never returns, so we want it to behave like raise – be usable where any type is expected. However, this doesn't immediately work. If you do something like type ('a, 'b) cont = 'a -> 'b, and pass that down to your nested function, the type checker will infer a type for 'b in one context, and then force you to call continuations only in contexts with the same type, i.e. you won't be able to do things like

(if ... then 3 else k 15)
...
(if ... then "s" else k 16)

because the first expression forces 'b to be int, but the second requires 'b to be string.

To solve this, we need to provide a function analogous to raise for early return, i.e.

(if ... then 3 else throw k 15)
...
(if ... then "s" else throw k 16)

This means stepping away from pure continuations. We have to un-partially-apply make_cont above (and I renamed it to throw), and pass the naked context around instead:

module BetterContinuation :
sig
  type 'a context

  val throw : 'a context -> 'a -> _
  val with_early_exit : ('a context -> 'a) -> 'a
end =

struct
  type 'a context = 'a option ref * int64
  exception Unwind of int64

  let throw ((cell, id) : 'a context) =
    fun result -> cell := Some result; raise_notrace (Unwind id)

  let generate_id = (* Same *)

  let with_early_exit f =
    let id = generate_id () in
    let cell = ref None in
    let context = (cell, id) in
    try
      f context
    with Unwind i when i = id ->
      match !cell with
      | Some result -> result
      | None        -> failwith "with_early_exit"
end



let _ =
  let nested_function i k = ignore (BetterContinuation.throw k 15); i in

  BetterContinuation.with_early_exit (nested_function 42)
  |> string_of_int
  |> print_endline

The expression throw k v can be used in contexts where different types are required.

I use this approach pervasively in some big applications I work on. I prefer it even to regular exceptions. I have a more elaborate variant, where with_early_exit has a signature roughly like this:

val with_early_exit : ('a context -> 'b) -> ('a -> 'b) -> 'b

where the first function represents an attempt to do something, and the second represents the handler for errors of type 'a that may result. Together with variants and polymorphic variants, this gives a more explicitly-typed take on exception handling. It is especially powerful with polymorphic variants, as the set of error variands can be inferred by the compiler.

The Jane Street approach effectively does the same as what is described here, and in fact I previously had an implementation that generated exception types with first-class modules. I am not sure anymore why I eventually chose this one – there may be subtle differences :)

0
ivg On

Exit is ok (I'm not sure whether I can say that it is idiomatic). But, make sure, that you're using raise_notrace, if you're using recent enough compiler (since 4.02).

Even better solution, is to use with_return from OCaml Core library. It will not have any problems with scope, because it will create a fresh new exception type for each nesting.

Of course, you can achieve the same results, or just take the source code of Core's implementation.

And even more idiomatic, is not to use exceptions for short-circuiting your iteration, and consider to use existing algorithm (find, find_map, exists, etc) or just write a recursive function, if no algorithm suits you.

0
anol On

Just to answer a specific part of my question which was not mentioned in other answers:

... using a precomputed let exit = Exit exception to avoid allocations inside the loop. Is it still required?

I did some micro-benchmarks using Core_bench on 4.02.1+fp and the results indicate no significant difference: when comparing two identical loops, one containing a local exit declared before the loop and another one without it, the time difference is minimal.

The difference between raise Exit and raise_notrace Exit in this example was also minimal, about 2% in some runs, up to 7% in others, but it could well be within the error margins of such a short experiment.

Overall, I couldn't measure any noticeable difference, so unless someone would have examples where Exit/exit significantly affect performance, I would prefer the former since it is clearer and avoids creating a mostly useless variable.

Finally, I also compared the difference between two idioms: using a reference to a value before exiting the loop, or creating a specific exception type containing the return value.

With reference to result value + Exit:

 let res = ref 0 in
 let r =
   try
     for i = 0 to n-1 do
       if a.(i) = v then
        (res := v; raise_notrace Exit)
     done;
     assert false
   with Exit -> !res
 in ...

With specific exception type:

 exception Res of int
 let r =
   try
     for i = 0 to n-1 do
       if a.(i) = v then
         raise_notrace (Res v)
     done;
     assert false
   with Res v -> v
 in ...

Once again, the differences were minimal and varied a lot between runs. Overall, the first version (reference + Exit) seemed to have a slight advantage (0% to 10% faster), but the difference was not significant enough to recommend one version over the another.

Since the former requires defining an initial value (which may not exist) or using an option type to initialize the reference, and the latter requires defining a new exception per type of value returned from the loop, there is no ideal solution here.

0
byako On

Regarding the point

using a precomputed let exit = Exit exception to avoid allocations inside the loop. Is it still required?

the answer is no with sufficiently recent versions of OCaml. Here is the relevant excerpt from the Changelog of OCaml 4.02.0.

  • PR#6203: Constant exception constructors no longer allocate (Alain Frisch)

Here is PR6203: http://caml.inria.fr/mantis/view.php?id=6203