URMs or: Why I Can’t Live Without ADTs
In a class at university I recently learned about computable functions, which are functions whose return values can be computed by a Turing machine. Our teacher, however, chose to define them as functions that can be computed by a unlimited register machine (URM). For those of you who don’t know it already, a URM is a theoretic, register-based machine architecture. Apparently URMs are Turing complete, so his choice shouldn’t make much of a difference.
A URM has an infinite register set and the instructions set is as follows:
- transfers the contents of to
- transfers 0 to
- increments the value of
- jumps to the -th instruction if the values of and coincide
There’s this cool URM simulator if you’re interested in playing with this. Here’s an example of a URM program that computes and stores it in .
Z(3)
Z(4)
J(2, 3, 8)
J(1, 4, 10)
S(3)
S(4)
J(1, 1, 3)
Z(3)
J(1, 1, 3)
This implementation was obtained from the following implementation in x64 assembly
mod:
mov 0, rcx
mov 0, rdx
f:
cmp rbx, rcx
je f_first_case
cmp rax, rdx
je f_second_case
f_third_case:
inc rcx
inc rdx
jmp f
f_first_case:
mov 0, rcx
jmp f
f_second_case:
halt
which was itself obtained from the following Haskell implementation.
mod :: Natural -> Natural -> Natural
mod a n = modTail 0 0
where
modTail acc c
| acc == n = modTail 0 c
| c == a = acc
| otherwise = modTail (acc + 1) (c + 1)
Anyway, as you can see I took this as an opportunity to experiment with some of the languages mentioned in a previous article of mine. One of the experiments I did was to implement a URM virtual machine in OCaml, and this is what I came up with:
open Array
type nat = Zero | Succ of nat
type instruction =
| T of nat * nat
| Z of nat
| S of nat
| J of nat * nat * int
(* A function nat -> nat is used to represent the sequence of all registers *)
type machine =
{ instruction_pointer : int;
registers : nat -> nat;
}
(* Executes the next instruction of a program *)
let step (program : instruction array) (m : machine) : machine =
if m.instruction_pointer >= length program then m
else
match program.(m.instruction_pointer) with
| T (r1, r2) ->
{ registers =
(fun n -> if n = r2 then m.registers r1 else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| Z r ->
{ registers = (fun n -> if n = r then Zero else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| S r ->
{ registers =
(fun n -> if n = r then Succ (m.registers n) else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| J (r1, r2, i) when m.registers r1 = m.registers r2 ->
{ m with instruction_pointer = i }
| J _ ->
{ m with instruction_pointer = m.instruction_pointer + 1 }
Implementing this was a lot of fun, so I naturally wondered if any more fun could be derived from implementing it in different languages or paradigms. Unfortunately, as far as I can tell this is not the case 😢
Notice that most of the program consists of a single match
statement. This is
a very natural way to describe this problem: switch through all the possible
instructions and update the machine accordingly. Indeed, this is the only way
to describe the problem I was able to come up with!
I tried to write equivalent programs in multiple languages such as Racket and
C, but I invariably ended up trying to emulate the match
statement above:
algebraic data types (ADTs) are the only sane way to describe the type of
instructions as far as I can tell. Hence OCaml — or any other language from
the ML family — is a natural choice, and in some sense the only possible
choice.
I guess what I am trying to say is that records and C-style enums are cool, but nothing beats ADTs when it comes to domain-modeling. Languages that do not support ADTs are necessarily less expressive than the ones that do. For instance compare
type instruction =
| T of nat * nat
| Z of nat
| S of nat
| J of nat * nat * int
with the equivalent tagged union in C:
struct instruction {
enum { T, Z, S, J } tag;
union {
struct { unsigned int r1; unsigned int r2; } t;
unsigned int z;
unsigned int s;
struct { unsigned int r1; unsigned int r2; int i; } j;
} value;
}
Anyway, if you do know a different way of solving this problem — in any language — please let me know! I guess this is all I had to say…