# URMs or: Why I Can’t Live Without ADTs

In a class at university I recently learned about
*computable functions*, which are functions whose return values
can be computed by a Turing machine. Our teacher, however, chose to
define them as functions that can be computed by a
*unlimited register machine* (URM). For those of you who
don’t know it already, a URM is a theoretic, register-based
machine architecture. Apparently URMs are Turing complete, so his choice
shouldn’t make much of a difference.

A URM has an infinite register set $\left({R}_{0},{R}_{2},\dots \right)$ and the instructions set is as follows:

- $T\left(n,m\right)$ transfers the contents of ${R}_{n}$ to ${R}_{m}$
- $Z\left(n\right)$ transfers 0 to ${R}_{n}$
- $S\left(n\right)$ increments the value of ${R}_{n}$
- $J\left(n,m,i\right)$ jumps to the $i$-th instruction if the values of ${R}_{n}$ and ${R}_{m}$ coincide

There’s this cool URM simulator if you’re interested in playing with this. Here’s an example of a URM program that computes ${R}_{1}\mathrm{mod}{R}_{2}$ and stores it in $R3$.

```
Z(3)
Z(4)
J(1, 4, 11)
J(2, 3, 8)
S(3)
S(4)
J(1, 1, 3)
Z(3)
S(4)
J(1, 1, 3)
```

This implementation was obtained from the following implementation in x64 assembly

```
mod:
mov 0, rcx
mov 0, rdx
f:
cmp rax, rdx
je end
cmp rbx, rcx
je f_second_case
f_third_case:
inc rcx
inc rdx
jmp f
f_second_case:
mov 0, rcx
inc rdx
jmp f
end:
halt
```

which was itself obtained from the following Haskell implementation.

```
mod :: Natural -> Natural -> Natural
mod a n = f 0 0
where f acc c
| c == a = acc
| acc == n = f 0 (c + 1)
| otherwise = f (acc + 1) (c + 1)
```

Anyway, as you can see I took this as an opportunity to experiment with some of the languages mentioned in a previous article of mine. One of the experiments I did was to implement a URM virtual machine in OCaml, and this is what I came up with:

```
open Array
type nat = Zero | Succ of nat
type instruction = T of nat * nat
| Z of nat
| S of nat
| J of nat * nat * int
(* A function nat -> nat is used to represent the sequence of all registers *)
type machine =
{ instruction_pointer : int;
registers : nat -> nat;
}
(* Executes the next instruction of a program *)
let step (program : instruction array) (m : machine) : machine =
if m.instruction_pointer >= length program then m
else
match program.(m.instruction_pointer) with
| T (r1, r2) ->
{ registers =
(fun n -> if n == r2 then m.registers r1 else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| Z r ->
{ registers = (fun n -> if n == r then Zero else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| S r ->
{ registers =
(fun n -> if n == r then Succ (m.registers n) else m.registers n);
instruction_pointer = m.instruction_pointer + 1;
}
| J (r1, r2, i) when m.registers r1 == m.registers r2 ->
{ m with instruction_pointer = i }
| J _ ->
{ m with instruction_pointer = m.instruction_pointer + 1 }
```

Implementing this was a lot of fun, so I naturally wondered if any more fun could be derived from implementing it in different languages or paradigms. Unfortunately, as far as I can tell this is not the case 😢

Notice that most of the program consists of a single
`match`

statement. This is a very natural way
to describe this problem: switch through all the possible instructions
and update the machine accordingly. Indeed, this is the only way to
describe the problem I was able to come up with!

I tried to write equivalent programs in multiple languages such as
Racket and C, but I invariably ended up trying to emulate the
`match`

statement above: algebraic data types
(ADTs) are the only sane way to describe the type of instructions as far
as I can tell. Hence OCaml — or any other language
from the ML family — is a natural choice, and in some
sense *the only possible choice*.

I guess what I am trying to say is that records and C-style enums are
cool, but nothing beats ADTs when it comes to domain-modeling. Languages
that do not support ADTs are *necessarily* less expressive than
the ones that do. For instance compare

```
type instruction = T of nat * nat
| Z of nat
| S of nat
| J of nat * nat * int
```

with the equivalent tagged union in C:

```
struct instruction {
enum { T, Z, S, J } tag;
union {
struct { unsigned int r1; unsigned int r2; } t;
unsigned int z;
unsigned int s;
struct { unsigned int r1; unsigned int r2; int i; } j;
} value;
}
```

Anyway, if you do know a *different* way of solving this
problem — in any language — please
let me know! I guess this is all I had to say…