When are physically distinct values created in OCaml?

224 views Asked by At

I'm trying to understand what the physical equality operators (Pervasives.(==) and Pervasives.(!=)) mean in OCaml.

The language manual says that the expression "" is a "constant", not an "expression":

6.5 Constants

constant ::== ... string-literal

but I can't find any verbiage indicating that constants are singly/pre-evaluated or pooled, and the REPL indicates that mutable string values are (thankfully) not pooled.

(* a *)  ""          == "";;             (* false *)
(* b *)  "foo"       == "foo";;          (* false *)
(* c *)  ""          == String.copy "";; (* false *)
(* d *)  ()          == ();;             (* true  *)
(* e *)  (42, -42)   == (42, -42);;      (* false *)
(* f *)  ("", 1)     == ("", 1);;        (* false *)
(* g *)  None        == None;;           (* true  *)
(* h *)  (Some None) == (Some None);;    (* false *)

Section "19.3 Representation of OCaml data types" suggests that the language specification requires that bools, ints, chars, the unit value, simple variants, and empty lists are selfless.

Does an implementation have to behave as above to be a complying OCaml implementation?

Can a complying OCaml implementation rewrite the pointer at b to point to a when a = b (* structurally *) is true and both are values of an immutable type (or effectively immutable values like zero length strings/arrays) as is sometimes done to reduce the number of reachable younger values in a generational GC?

2

There are 2 answers

2
Jeffrey Scofield On BEST ANSWER

As I read the language spec, there are very few guarantees about when values are distinct. I believe the only guarantee is in the documentation of the Pervasives module:

On mutable types such as references, arrays, strings, records with mutable fields and objects with mutable instance variables, e1 == e2 is true if and only if physical modification of e1 also affects e2. On non-mutable types, the behavior of ( == ) is implementation-dependent; however, it is guaranteed that e1 == e2 implies compare e1 e2 = 0.

One of the cool things about FP is that the compiler and runtime are free to do arbitrarily clever things with immutable values. So this guarantee is about all you would really want to have (IMHO).

In sum, yes, the runtime or the compiler is free (again, IMHO) to share (and not share) immutable values any way it thinks will be helpful.

I wouldn't interpret the representation section as part of the language specification. It's just useful documentation of the current implementation.

0
gasche On

Just a remark: String constants are pooled in a different way than what you test:

let f () = "foo"
let b = f () == f () (* true *)

This may indeed lead to bugs if you mutate the output of a f () call: this will affect all further calls as well. The consensus on that behavior is that mutable string are an historical mistake (one should have a mutable buffer type distinct from the main string type, on which encoding choices and concatenation complexity should be more important) and that the semantics-breaking pooling is interesting enough performance-wise to be allowed to assume that string constants are not mutated. If one want to avoid pooling, one should just call a String.copy directly on the string constant.