Chapter 4Standard ML

Datatypes

Algebraic types: case-of, constructors

Concept

A datatype (Def 3.1 in the slides, Chapter 5 section 3) is SML's mechanism for defining algebraic / inductive types. A datatype declaration lists constructors that build values of the type.

Simple example — enumeration:
datatype day = mon | tue | wed | thu | fri | sat | sun;

This creates seven nullary constructors; day is a type with exactly seven values. Functions over an enumeration use complete pattern matching:

fun is_weekend sat = true | is_weekend sun = true | is_weekend _ = false;

Constructors with arguments:

datatype shape = Circle of real | Square of real | Triangle of real * real * real;

This is the geometric-shapes example. Circle r is a value, Triangle(3.0, 4.0, 5.0) is a value, etc.

Computing on a datatype — area:

fun area (Circle r) = Math.pi * r * r | area (Square a) = a * a | area (Triangle(a, b, c)) = let val s = (a + b + c) / 2.0 in Math.sqrt(s * (s - a) * (s - b) * (s - c)) end;

The Maybe pattern:
datatype 'a option = NONE | SOME of 'a;

This is the standard way to handle partial functions: NONE when there is no result, SOME x when there is. Pattern-matching on the constructor forces you to handle both cases. (This is exactly how SML's library represents results of TextIO.input1 and similar partial operations.)

Animation — sml datatypes
Transcript — click a line to jump10 cues
  1. 0.0sAlgebraic Datatypes
  2. 1.4sType label appears: datatype tree = ...
  3. 2.0sTokens reveal: datatype tree = Empty |
  4. 4.8sSecond line: Node of int times tree times tree
  5. 6.1sConstructors Empty and Node are boxed amber
  6. 7.5sSample tree label and a sample expression appear
  7. 9.3sEdges drawn, root node 1 fades in
  8. 10.3sLeft child node 2 fades in
  9. 11.1sThree empty leaves appear with slashes
  10. 12.4sEmpty is the base case; Node recursively builds
Inductively defined sets: no confusion, no junk

An inductively defined set (Def 2.8) on a family of sets N1,,NkN_1, \ldots, N_k has the form

SN1NkS \subseteq N_1 \cup \cdots \cup N_k

where each NiN_i is the image of a constructor ci:TiSc_i: T_i \to S for some 'type' TiT_i over SS itself. We require:

  • Closure: Every ci(t)c_i(t) for tTit \in T_i is in SS.
  • No internal confusion (injectivity): Different inputs t1t2t_1 \neq t_2 give ci(t1)ci(t2)c_i(t_1) \neq c_i(t_2)within the same constructor.
  • No inter-constructor confusion: Images of different constructors are disjoint: im(ci)im(cj)=\text{im}(c_i) \cap \text{im}(c_j) = \emptyset for iji \neq j.
  • No junk: Every element of SS is reachable via the constructors.

Example: N1=N,{s,o}\mathbb{N}_1 = \langle \mathbb{N}, \{s, o\} \rangle with o=0o = 0 and s(n)=n+1s(n) = n+1. The four conditions are exactly the Peano axioms (Def 1.6).

In SML, this is exactly what datatype declarations provide:

datatype mynat = zero | suc of mynat;

The 'no confusion' rules mean pattern matching is sound: case e of zero => ... | suc n => ... covers all elements exactly once.

Peano axioms for lists (LP1–LP5)

Just as N\mathbb{N} is inductively defined by {o,s}\{o, s\}, lists over NN are inductively defined by the constructors nil and :: (cons). The 'List Peano axioms' LP1-LP5 are:

  • LP1: nil\text{nil} is a list.
  • LP2: If aNa \in N and ll is a list, then a::la :: l is a list.
  • LP3 (injectivity of ::): if a::l1=a::l2a :: l_1 = a' :: l_2 then a=aa = a' and l1=l2l_1 = l_2.
  • LP4 (disjointness): nila::l\text{nil} \neq a :: l for any a,la, l.
  • LP5 (induction): A property PP holds for all lists if (a) P(nil)P(\text{nil}) and (b) for all a,la, l: P(l)P(a::l)P(l) \Rightarrow P(a :: l).

Together these say lists are freely generated by nil and cons — same shape as the Peano naturals.

Append and length as defining equations

On the unary naturals (Peano) we can derive total functions. The same is true on lists:

Definition 2.18 (Append).
- nil@r=r\text{nil} \mathbin{@} r = r
- (a::l)@r=a::(l@r)(a :: l) \mathbin{@} r = a :: (l \mathbin{@} r)

Lemma 2.20. @ is total — it terminates on all inputs.
Proof. By induction on ll: base case l=nill = \text{nil} is trivial. Step case: l=a::ll = a :: l', by induction hypothesis l@rl' \mathbin{@} r terminates, then (a::l)@r=a::(l@r)(a :: l') \mathbin{@} r = a :: (l' \mathbin{@} r) terminates by construction of ::. \square

Corollary 2.21. (l1 @ l2) @ l3 = l1 @ (l2 @ l3) — append is associative.

Definition 2.22 (Length).
- λ(nil)=o\lambda(\text{nil}) = o
- λ(a::l)=s(λ(l))\lambda(a :: l) = s(\lambda(l))

Definition 2.23 (Reverse).
- ρ(nil)=nil\rho(\text{nil}) = \text{nil}
- ρ(a::l)=ρ(l)@(a::nil)\rho(a :: l) = \rho(l) \mathbin{@} (a :: \text{nil})

Compiler warnings on pattern matching

SML's compiler emits two useful warnings on incomplete or inconsistent patterns:

  • Match non-exhaustive: a case or fun does not cover all constructors.
fun bad (suc n) = n; (* Warning: match non-exhaustive zero => ... *)
  • Match redundant: a clause is subsumed by an earlier clause.
fun bad n = 0 | bad n = 1; (* redundant: never reached *)

These are not errors (SML will still produce a function — possibly raising Match at runtime for missing cases) but should be heeded.

Binary tree: defining flip

A binary tree is defined by the recursive datatype:

datatype btree = empty | node of int * btree * btree;

Or with named leaves:

datatype btree = Empty | Node of int * btree * btree;

The flip function swaps the left and right subtrees of every node:

fun flip Empty = Empty | flip (Node (x, l, r)) = Node (x, flip r, flip l);

Trace. flip (Node (1, Node (2, Empty, Empty), Node (3, Empty, Empty))):
1. Outer: x=1,l=Node(2,Empty,Empty),r=Node(3,Empty,Empty)x=1, l = Node(2, Empty, Empty), r = Node(3, Empty, Empty).
2. Inner left: flip Empty = Empty. Inner right: flip Empty = Empty. So flipped left = Node(3, Empty, Empty). Flipped right = Node(2, Empty, Empty).
3. Result: Node(1, Node(3, Empty, Empty), Node(2, Empty, Empty)).

Termination. flip terminates by structural recursion on the tree: each recursive call is on a strictly smaller subtree.

Used in assignment 5.5.

Practice — score 100% to advance
Multiple choice
Q1
What does a datatype declaration list?
Q2
In datatype day = mon | tue | wed, how many values does day have?
Q3
What does datatype 'a option = NONE | SOME of 'a represent?
Q4
When you write functions over a datatype, what must you do?
Q5
In datatype shape = Circle of real | Square of real, what is Square 4.0?
Q6
Why are datatypes sometimes called 'algebraic'?
Q7
The inductive definition of lists uses constructors:
Q8
flip Empty returns:
Fill in the blank
Q1
fun bad (suc n) = n triggers a 'match ___' warning.
Q2
flip (Node (1, Node (2, E, E), E)) returns Node (1, E, Node (2, E, E)) — left and right are ___.
Match definitions
Match each concept on the left to its definition on the right.
Loading…