# URMs or: Why I Can’t Live Without ADTs

In a class at university I recently learned about *computable functions*, which
are functions whose return values can be computed by a Turing machine. Our
teacher, however, chose to define them as functions that can be computed by a
*unlimited register machine* (URM). For those of you who don’t know it
already, a URM is a theoretic, register-based machine architecture. Apparently
URMs are Turing complete, so his choice shouldn’t make much of a difference.

A URM has an infinite register set $(R_0, R_1, \ldots)$ and the instructions set is as follows:

- $T(n, m)$ transfers the contents of $R_n$ to $R_m$
- $Z(n)$ transfers 0 to $R_n$
- $S(n)$ increments the value of $R_n$
- $J(n, m, i)$ jumps to the $i$-th instruction if the values of $R_n$ and $R_m$ coincide

There’s this cool URM simulator if you’re interested in playing with this. Here’s an example of a URM program that computes $R_1 \; \operatorname{mod} \; R_2$ and stores it in $R_3$.

```
Z(3)
Z(4)
J(2, 3, 8)
J(1, 4, 10)
S(3)
S(4)
J(1, 1, 3)
Z(3)
J(1, 1, 3)
```

This implementation was obtained from the following implementation in x64 assembly

```
mod:
mov 0, rcx
mov 0, rdx
f:
cmp rbx, rcx
je f_first_case
cmp rax, rdx
je f_second_case
f_third_case:
inc rcx
inc rdx
jmp f
f_first_case:
mov 0, rcx
jmp f
f_second_case:
halt
```

which was itself obtained from the following Haskell implementation.

```
mod :: Natural -> Natural -> Natural
mod a n = modTail 0 0
where
modTail acc c
| acc == n = modTail 0 c
| c == a = acc
| otherwise = modTail (acc + 1) (c + 1)
```

Anyway, as you can see I took this as an opportunity to experiment with some of the languages mentioned in a previous article of mine. One of the experiments I did was to implement a URM virtual machine in OCaml, and this is what I came up with:

```
open Array
type nat = Zero | Succ of nat
type instruction =
| T of nat * nat
| Z of nat
| S of nat
| J of nat * nat * int
(* A function nat -> nat is used to represent the sequence of all registers *)
type machine =
{ instruction_pointer : int;
registers : nat -> nat;
}
(* Executes the next instruction of a program *)
let step (program : instruction array) (m : machine) : machine =
if m.instruction_pointer >= length program then m
else
match program.(m.instruction_pointer) with
| T (r1, r2) ->
{ registers =
(fun n -> if n = r2 then m.registers r1 else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| Z r ->
{ registers = (fun n -> if n = r then Zero else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| S r ->
{ registers =
(fun n -> if n = r then Succ (m.registers n) else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| J (r1, r2, i) when m.registers r1 = m.registers r2 ->
{ m with instruction_pointer = i }
| J _ ->
{ m with instruction_pointer = m.instruction_pointer + 1 }
```

Implementing this was a lot of fun, so I naturally wondered if any more fun could be derived from implementing it in different languages or paradigms. Unfortunately, as far as I can tell this is not the case 😢

Notice that most of the program consists of a single `match`

statement. This is
a very natural way to describe this problem: switch through all the possible
instructions and update the machine accordingly. Indeed, this is the only way
to describe the problem I was able to come up with!

I tried to write equivalent programs in multiple languages such as Racket and
C, but I invariably ended up trying to emulate the `match`

statement above:
algebraic data types (ADTs) are the only sane way to describe the type of
instructions as far as I can tell. Hence OCaml — or any other language from
the ML family — is a natural choice, and in some sense *the only possible
choice*.

I guess what I am trying to say is that records and C-style enums are cool, but
nothing beats ADTs when it comes to domain-modeling. Languages that do not
support ADTs are *necessarily* less expressive than the ones that do. For
instance compare

```
type instruction =
| T of nat * nat
| Z of nat
| S of nat
| J of nat * nat * int
```

with the equivalent tagged union in C:

```
struct instruction {
enum { T, Z, S, J } tag;
union {
struct { unsigned int r1; unsigned int r2; } t;
unsigned int z;
unsigned int s;
struct { unsigned int r1; unsigned int r2; int i; } j;
} value;
}
```

Anyway, if you do know a *different* way of solving this problem — in any
language — please let me know! I guess this is all I had to say…