I am new to OCaml. And I am writing a simple assembler compiler. I have the following types which I use in my AST.
type asmreg8
type asmreg16
type asmreg32
type asmreg64
type asmreg128
type _ reg =
| Reg8 : string -> asmreg8 reg
| Reg16 : string -> asmreg16 reg
| Reg32 : string -> asmreg32 reg
| Reg64 : string -> asmreg64 reg
| Reg128 : string -> asmreg128 reg
type asmlabel
type _ const = ASMConst of string const
type _ var = ASMVar of string var
type _ label = ASMLabel of asmlabel label
type double_arg =
| RegToReg : 'a reg * 'a reg -> double_arg
| RegToConst : 'a reg * 'b const -> double_arg
type mnemonic = Mov | Add | Inc | Ret
type single_arg =
| Var : _ var -> single_arg
| Reg : _ reg -> single_arg
| Label : _ label -> single_arg
type command =
| Args2 of mnemonic * double_arg
| Args1 of mnemonic * single_arg
| Args0 of mnemonic
type code_section = Command of command | Id of asmlabel label
type data_type = DB | DW | DD | DQ | DT
type var = string * data_type * string list
type dir = Code of code_section list | Data of var list
I think types are good because they forbid operations on different (by bits) registers. But now I have a problem. I do not know how to write a function like the following
let string_to_reg = function
| "something" -> Reg8 "something"
| "Reg16" -> Reg16 "16"
| _ -> Reg32 "123"
I just do not know how to get a function that takes a string (or another type that does not explicitly contain 'a) and returns values of type 'a reg where 'a can be asmreg8, asmreg16 ets. I have read several articles about GADT and phantom types, but have not found an example that suits me. I'm starting to think I'm missing something about types and that things should be a lot simpler.
I tried this code `
let string_to_reg : type a. string -> a reg = function
| "something" -> Reg8 "something"
| "Reg16" -> Reg16 "16"
| _ -> Reg32 "123"
but I get
This expression has type asmreg8 reg but an expression was expected of type
a reg
Type asmreg8 is not compatible with type a
I thought I can just show to compiler that the function returns a polymorphic type. I tried this
let string_to_reg : string -> 'a reg = function
| "something" -> Reg8 "something"
| "Reg16" -> Reg16 "16"
| _ -> Reg32 "123"
but I get a string -> asmreg8 reg function and it can not return value of asmreg16 reg type.
I tried pack my _ ret types by another type
type asmreg =
| ASM8 : asmreg8 reg -> asmreg
| ASM16 : asmreg16 reg -> asmreg
| ASM32 : asmreg32 reg -> asmreg
| ASM64 : asmreg64 reg -> asmreg
| ASM128 : asmreg128 reg -> asmreg
let reg_to_asmreg : type a. asmreg -> a reg = function
| ASM8 x -> Reg8 x
| ASM16 x -> Reg16 x
| _ -> Reg32 "123"
but I have a same problem. I can not write a function that returns values of different reg types.
It is not possible to write the function
string_to_reg
directlybecause the type of the function would not be
'a. string -> 'a reg
. Indeed this type means that the function returns a register of all possible type simultaneously which is not what you meant. The correct type of the result of the function would be something like(x:string) -> reg_type(x)
in a dependently typed language. In other words, you want the type of the result to depend on the value of its argument. This is not possible in OCaml (in the core language).This is common issue when interfacing a strongly typed GADT DSLs with the exterior world. The solution is to use another layer of GADTs to convey the information that the type of the parsed register cannot be known in advance:
Here, if I have a value
Dyn x
, I know that there is some type'e
such that the type of x isx:'e reg
, but I don't know anything more about this type'e
(this kind of type variable'e
is said to be existentially quantified type). Thus I can only apply function onx
that work for any'e reg
.Since
reg
is a GADTs, the possible values for'e reg
are well-known, so there are still many useful function of type'e. 'e reg -> ...
and it should still possible to use the parsed register.An important point to notice however is that this
dyn_reg
type has mostly erased all fine-grained type information that was available on the GADTs definition, and whenever we are usingdyn_reg
, we are in fact using an encoding of the normal variants. Thus, if you end up using onlydyn_reg
, it is a sign that GADTs were not really useful in this case.EDIT:
A good example of the consequence of erasing the types is to try to write a move function. Just writing
fails because the type of
a
andb
might not be compatible. However, once we check that the types are compatible, the function compiles as expected