Chapter 4Standard ML

Datatypes

Algebraic types: case-of, constructors

Concept

A datatype (Def 3.1 in the slides, Chapter 5 section 3) is SML's mechanism for defining algebraic / inductive types. A datatype declaration lists constructors that build values of the type.

Simple example — enumeration:
datatype day = mon | tue | wed | thu | fri | sat | sun;

This creates seven nullary constructors; day is a type with exactly seven values. Functions over an enumeration use complete pattern matching:

fun is_weekend sat = true
  | is_weekend sun = true
  | is_weekend _ = false;

Constructors with arguments:

datatype shape =
  Circle of real
  | Square of real
  | Triangle of real * real * real;

This is the geometric-shapes example. Circle r is a value, Triangle(3.0, 4.0, 5.0) is a value, etc.

Computing on a datatype — area:

fun area (Circle r) = Math.pi * r * r
  | area (Square a) = a * a
  | area (Triangle(a, b, c)) =
      let val s = (a + b + c) / 2.0
      in Math.sqrt(s * (s - a) * (s - b) * (s - c)) end;

The Maybe pattern:
datatype 'a option = NONE | SOME of 'a;

This is the standard way to handle partial functions: NONE when there is no result, SOME x when there is. Pattern-matching on the constructor forces you to handle both cases. (This is exactly how SML's library represents results of TextIO.input1 and similar partial operations.)

Animation — sml datatypes

Transcript — click a line to jump10 cues

0.0sAlgebraic Datatypes
1.4sType label appears: datatype tree = ...
2.0sTokens reveal: datatype tree = Empty |
4.8sSecond line: Node of int times tree times tree
6.1sConstructors Empty and Node are boxed amber
7.5sSample tree label and a sample expression appear
9.3sEdges drawn, root node 1 fades in
10.3sLeft child node 2 fades in
11.1sThree empty leaves appear with slashes
12.4sEmpty is the base case; Node recursively builds

Inductively defined sets: no confusion, no junk

An inductively defined set (Def 2.8) on a family of sets $N_1, \ldots, N_k$ has the form

S \subseteq N_1 \cup \cdots \cup N_k

where each

N_i

is the image of a constructor

c_i: T_i \to S

for some 'type'

T_i

over

S

itself. We require:

Closure: Every $c_i(t)$ for $t \in T_i$ is in $S$ .
No internal confusion (injectivity): Different inputs $t_1 \neq t_2$ give $c_i(t_1) \neq c_i(t_2)$ — within the same constructor.
No inter-constructor confusion: Images of different constructors are disjoint: $\text{im}(c_i) \cap \text{im}(c_j) = \emptyset$ for $i \neq j$ .
No junk: Every element of $S$ is reachable via the constructors.

Example: $\mathbb{N}_1 = \langle \mathbb{N}, \{s, o\} \rangle$ with $o = 0$ and $s(n) = n+1$ . The four conditions are exactly the Peano axioms (Def 1.6).

In SML, this is exactly what datatype declarations provide:


datatype mynat = zero | suc of mynat;

The 'no confusion' rules mean pattern matching is sound: case e of zero => ... | suc n => ... covers all elements exactly once.

Peano axioms for lists (LP1–LP5)

Just as $\mathbb{N}$ is inductively defined by $\{o, s\}$ , lists over $N$ are inductively defined by the constructors nil and :: (cons). The 'List Peano axioms' LP1-LP5 are:

LP1: $\text{nil}$ is a list.
LP2: If $a \in N$ and $l$ is a list, then $a :: l$ is a list.
LP3 (injectivity of ::): if $a :: l_1 = a' :: l_2$ then $a = a'$ and $l_1 = l_2$ .
LP4 (disjointness): $\text{nil} \neq a :: l$ for any $a, l$ .
LP5 (induction): A property $P$ holds for all lists if (a) $P(\text{nil})$ and (b) for all $a, l$ : $P(l) \Rightarrow P(a :: l)$ .

Together these say lists are freely generated by nil and cons — same shape as the Peano naturals.

Append and length as defining equations

On the unary naturals (Peano) we can derive total functions. The same is true on lists:

Definition 2.18 (Append).
- $\text{nil} \mathbin{@} r = r$
- $(a :: l) \mathbin{@} r = a :: (l \mathbin{@} r)$

Lemma 2.20. @ is total — it terminates on all inputs.
Proof. By induction on $l$ : base case $l = \text{nil}$ is trivial. Step case: $l = a :: l'$ , by induction hypothesis $l' \mathbin{@} r$ terminates, then $(a :: l') \mathbin{@} r = a :: (l' \mathbin{@} r)$ terminates by construction of ::. $\square$

Corollary 2.21. (l1 @ l2) @ l3 = l1 @ (l2 @ l3) — append is associative.

Definition 2.22 (Length).
- $\lambda(\text{nil}) = o$
- $\lambda(a :: l) = s(\lambda(l))$

Definition 2.23 (Reverse).
- $\rho(\text{nil}) = \text{nil}$
- $\rho(a :: l) = \rho(l) \mathbin{@} (a :: \text{nil})$

Compiler warnings on pattern matching

SML's compiler emits two useful warnings on incomplete or inconsistent patterns:

Match non-exhaustive: a case or fun does not cover all constructors.


   fun bad (suc n) = n;
   (* Warning: match non-exhaustive
      zero => ... *)

Match redundant: a clause is subsumed by an earlier clause.


   fun bad n = 0
     | bad n = 1;  (* redundant: never reached *)

These are not errors (SML will still produce a function — possibly raising Match at runtime for missing cases) but should be heeded.

Binary tree: defining flip

A binary tree is defined by the recursive datatype:


datatype btree = empty | node of int * btree * btree;

Or with named leaves:


datatype btree = Empty
               | Node of int * btree * btree;

The flip function swaps the left and right subtrees of every node:


fun flip Empty            = Empty
  | flip (Node (x, l, r)) = Node (x, flip r, flip l);

Trace. flip (Node (1, Node (2, Empty, Empty), Node (3, Empty, Empty))):
1. Outer: $x=1, l = Node(2, Empty, Empty), r = Node(3, Empty, Empty)$ .
2. Inner left: flip Empty = Empty. Inner right: flip Empty = Empty. So flipped left = Node(3, Empty, Empty). Flipped right = Node(2, Empty, Empty).
3. Result: Node(1, Node(3, Empty, Empty), Node(2, Empty, Empty)).

Termination. flip terminates by structural recursion on the tree: each recursive call is on a strictly smaller subtree.

Used in assignment 5.5.

Practice — score 100% to advance

Multiple choice

What does a datatype declaration list?

In datatype day = mon | tue | wed, how many values does day have?

What does datatype 'a option = NONE | SOME of 'a represent?

When you write functions over a datatype, what must you do?

In datatype shape = Circle of real | Square of real, what is Square 4.0?

Why are datatypes sometimes called 'algebraic'?

The inductive definition of lists uses constructors:

flip Empty returns:

Fill in the blank

fun bad (suc n) = n triggers a 'match ___' warning.

flip (Node (1, Node (2, E, E), E)) returns Node (1, E, Node (2, E, E)) — left and right are ___.

Match definitions

Match each concept on the left to its definition on the right.

Loading…