Abstract vs Concrete Grammars
Tree grammars vs bracketed string grammars
The abstract grammar of a language is the parse tree; the concrete
grammar is the bracketed string. This distinction is crucial in compilers and
symbolic AI.
Definition 7.5.1 (notes p. 128): A tree grammar is a grammar where the
result data type is trees, not strings.
Definition 7.5.2: A regular tree grammar is a triple with start symbol , nonterminals , and productions
where and is a "well-formed expression" over (terminal
constructors) and .
Definition 7.5.3 (notes p. 129): The underlying tree grammar of a language is
often called the abstract grammar; any derived (string) grammars — there
may be multiple — are called concrete grammars.
Why abstract is better:
- No need for brackets/parens to disambiguate.
- One canonical tree per formula (often).
- Tree manipulation is direct — recursion over tree structure.
Why concrete is needed:
- Humans want to read and write strings.
- Files and network communication use strings.
- We need a way to convert between abstract and concrete.
Example: an arithmetic expression .
- Abstract (parse tree):
prod / \ sum 5 / \ 2 3
- Concrete: the string (with explicit brackets and infix).
Both describe the same expression. In a compiler, the front-end parses the
concrete syntax into the abstract syntax tree (AST); the back-end optimizes and
generates code from the AST.
SML datatypes give a concrete representation of term languages. For arithmetic:
datatype aterm = anum of int
| avar of int
| aneg of aterm
| asum of aterm * aterm
| aprod of aterm * aterm;
Example term representing :
val ex2 = aneg (aprod (asum (avar 29, anum 3), anum 555));
This datatype is recursive: anum, aneg, etc. all contain aterms. Pattern matching on the constructors is how we write recursive functions over terms.
Evaluating an arithmetic term (variable bindings as int list):
fun aeval [] (anum n) = n
| aeval env (avar i) = lookup env i
| aeval env (aneg t) = ~ (aeval env t)
| aeval env (asum (l, r)) = aeval env l + aeval env r
| aeval env (aprod (l, r)) = aeval env l * aeval env r;
Serialising a term to a string:
fun aprint (anum n) = Int.toString n
| aprint (avar i) = "X" ^ Int.toString i
| aprint (aneg t) = "-" ^ "(" ^ aprint t ^ ")"
| aprint (asum (l, r)) = "(" ^ aprint l ^ "+" ^ aprint r ^ ")"
| aprint (aprod (l, r)) = "(" ^ aprint l ^ "*" ^ aprint r ^ ")";
Test: aprint ex2 evaluates to "-(((X29+3)*555))". ✓
Mathematical expressions can be classified into five categories that capture the main syntactic patterns:
- Literal — a constant number:
3,42,3.14. - Named variable — a variable symbol:
x,y_{i,j}. - Operator application —
f(a, b),+,*,=. - Named constant — a named constant (different from a literal): , , .
- Binding — introduces new variables: , , .
In OpenMath (Def 5.5), these are abstract symbols:
term ::= lit(real)
| var(string)
| app(omsymbol, term*) (* operator application *)
| const(omsymbol) (* named constant *)
| bind(omsymbol, var*, term) (* binding *)
Where omsymbol is a Content Dictionary entry — a URI identifying a known mathematical object. This lets expressions be transmitted symbolically with full semantic meaning.
Renaming bound variables preserves meaning: . This is called -renaming or bound-variable renaming.
OpenMath objects can be serialised as XML using a small set of tags:
<OMOBJ>— wraps an OpenMath object.<OMA>— operator application.<OMBIND>— binding (binder + bound vars + body).<OMBVAR>— list of bound variables.<OMS>— a symbol reference (a named constant or operator).<OMV>— a variable.
Example: encodes as:
<OMOBJ>
<OMBIND>
<OMS cd="logic1" name="forall"/>
<OMBVAR><OMV name="a"/><OMV name="b"/></OMBVAR>
<OMA>
<OMS cd="relation1" name="eq"/>
<OMA><OMS cd="arith1" name="plus"/><OMV name="a"/><OMV name="b"/></OMA>
<OMA><OMS cd="arith1" name="plus"/><OMV name="b"/><OMV name="a"/></OMA>
</OMA>
</OMBIND>
</OMOBJ>
This is a concrete XML grammar for an abstract syntax — same idea as our abstract vs concrete grammars, with cd="..." pointing to a Content Dictionary.
anum of int in a datatype:OMS element refers to:aprod(asum(avar 29, anum 3), anum 555) has root constructor ___.