Chapter 7Formal Languagesnotes.en.pdf:128-129

Abstract vs Concrete Grammars

Tree grammars vs bracketed string grammars

Def 7.5.1Tree grammar
Concept

The abstract grammar of a language is the parse tree; the concrete
grammar
is the bracketed string. This distinction is crucial in compilers and
symbolic AI.

Definition 7.5.1 (notes p. 128): A tree grammar is a grammar where the
result data type is trees, not strings.

Definition 7.5.2: A regular tree grammar is a triple S,N,P\langle S, N, P \rangle with start symbol SS, nonterminals NN, and productions AtA \to t
where ANA \in N and tt is a "well-formed expression" over Σ\Sigma (terminal
constructors) and NN.

Definition 7.5.3 (notes p. 129): The underlying tree grammar of a language is
often called the abstract grammar; any derived (string) grammars — there
may be multiple — are called concrete grammars.

Why abstract is better:
- No need for brackets/parens to disambiguate.
- One canonical tree per formula (often).
- Tree manipulation is direct — recursion over tree structure.

Why concrete is needed:
- Humans want to read and write strings.
- Files and network communication use strings.
- We need a way to convert between abstract and concrete.

Example: an arithmetic expression (2+3)×5(2 + 3) \times 5.

  • Abstract (parse tree):
prod
/   \
sum   5
/ \
2  3
  • Concrete: the string (2+3)×5(2 + 3) \times 5 (with explicit brackets and infix).

Both describe the same expression. In a compiler, the front-end parses the
concrete syntax into the abstract syntax tree (AST); the back-end optimizes and
generates code from the AST.

Programming with parse trees in SML: aterm datatype

SML datatypes give a concrete representation of term languages. For arithmetic:

datatype aterm = anum of int | avar of int | aneg of aterm | asum of aterm * aterm | aprod of aterm * aterm;

Example term representing ((X29+3)555)-((X_{29} + 3) * 555):

val ex2 = aneg (aprod (asum (avar 29, anum 3), anum 555));

This datatype is recursive: anum, aneg, etc. all contain aterms. Pattern matching on the constructors is how we write recursive functions over terms.

Evaluating an arithmetic term (variable bindings as int list):

fun aeval [] (anum n) = n | aeval env (avar i) = lookup env i | aeval env (aneg t) = ~ (aeval env t) | aeval env (asum (l, r)) = aeval env l + aeval env r | aeval env (aprod (l, r)) = aeval env l * aeval env r;

Serialising a term to a string:

fun aprint (anum n) = Int.toString n | aprint (avar i) = "X" ^ Int.toString i | aprint (aneg t) = "-" ^ "(" ^ aprint t ^ ")" | aprint (asum (l, r)) = "(" ^ aprint l ^ "+" ^ aprint r ^ ")" | aprint (aprod (l, r)) = "(" ^ aprint l ^ "*" ^ aprint r ^ ")";

Test: aprint ex2 evaluates to "-(((X29+3)*555))". ✓

Standardising math expressions: 5 categories

Mathematical expressions can be classified into five categories that capture the main syntactic patterns:

  • Literal — a constant number: 3, 42, 3.14.
  • Named variable — a variable symbol: x, y_{i,j}.
  • Operator applicationf(a, b), +, *, =.
  • Named constant — a named constant (different from a literal): π\pi, ee, cc.
  • Binding — introduces new variables: x.P(x)\forall x. P(x), 01f(x)dx\int_0^1 f(x) dx, i=0nai\sum_{i=0}^n a_i.

In OpenMath (Def 5.5), these are abstract symbols:

term ::= lit(real) | var(string) | app(omsymbol, term*) (* operator application *) | const(omsymbol) (* named constant *) | bind(omsymbol, var*, term) (* binding *)

Where omsymbol is a Content Dictionary entry — a URI identifying a known mathematical object. This lets expressions be transmitted symbolically with full semantic meaning.

Renaming bound variables preserves meaning: x.q(x)=y.q(y)\forall x. q(x) = \forall y. q(y). This is called α\alpha-renaming or bound-variable renaming.

OpenMath XML encoding

OpenMath objects can be serialised as XML using a small set of tags:

  • <OMOBJ> — wraps an OpenMath object.
  • <OMA> — operator application.
  • <OMBIND> — binding (binder + bound vars + body).
  • <OMBVAR> — list of bound variables.
  • <OMS> — a symbol reference (a named constant or operator).
  • <OMV> — a variable.

Example: a,b.a+b=b+a\forall a, b.\, a + b = b + a encodes as:

&lt;OMOBJ&gt; &lt;OMBIND&gt; &lt;OMS cd="logic1" name="forall"/&gt; &lt;OMBVAR&gt;&lt;OMV name="a"/&gt;&lt;OMV name="b"/&gt;&lt;/OMBVAR&gt; &lt;OMA&gt; &lt;OMS cd="relation1" name="eq"/&gt; &lt;OMA&gt;&lt;OMS cd="arith1" name="plus"/&gt;&lt;OMV name="a"/&gt;&lt;OMV name="b"/&gt;&lt;/OMA&gt; &lt;OMA&gt;&lt;OMS cd="arith1" name="plus"/&gt;&lt;OMV name="b"/&gt;&lt;OMV name="a"/&gt;&lt;/OMA&gt; &lt;/OMA&gt; &lt;/OMBIND&gt; &lt;/OMOBJ&gt;

This is a concrete XML grammar for an abstract syntax — same idea as our abstract vs concrete grammars, with cd="..." pointing to a Content Dictionary.

Worked example
Step 0 of 2
Practice — score 100% to advance
Multiple choice
Q1
An abstract grammar generates…
Q2
A concrete grammar generates…
Q3
Why use abstract syntax trees in compilers?
Q4
What is a regular tree grammar?
Q5
Why do we still need concrete syntax?
Q6
The recursive case anum of int in a datatype:
Q7
OpenMath's OMS element refers to:
Fill in the blank
Q1
The term aprod(asum(avar 29, anum 3), anum 555) has root constructor ___.
Match definitions
Match each concept on the left to its definition on the right.
Loading…