Parse Trees
Trees that show how a string is derived
A derivation is a sequence of strings. A parse tree is the same information
shown as a tree. Trees are far easier to read and use than linear derivations.
Definition 7.3.7 (notes p. 141): Given a CFG accepting string , a
parse tree is a node-labeled, ordered tree representing the syntactic
structure of .
Definition 7.3.8: For a derivation (where is a
sentence), the parse tree is built recursively:
- If the last step is a lexical rule (single terminal), then is
a one-node tree with root labeled and a single child labeled .
- If factors as via rule , the root of is labeled , with one child for each
, where the -th child is the parse tree of the subderivation for .
Properties of parse trees:
- The root is always labeled with the start symbol .
- The leaves, read left to right, give the yield — the original string.
- Internal nodes are nonterminals; their children form the right-hand side of
the production used.
Ambiguity: a CFG is ambiguous if some string has multiple parse trees
(multiple derivations). Many grammars can be rewritten to remove ambiguity (but
not always — some CFLs are inherently ambiguous).
Onion principle: derivation peels off one layer of structure at a time,
descending to terminals. The parse tree makes this layering explicit.
- 0.0sParse Tree for: 1 + 2 * 3
- 0.8sGrammar on the left: E, T, F with sum and product levels.
- 2.1sRoot node E appears at the top.
- 2.7sApply E -> E + T: expand to E, +, T.
- 4.8sLeftmost E -> T: the left child becomes T.
- 6.2sThen T -> F: that T becomes F.
- 7.6sF -> n: the leftmost leaf is now 1.
- 9.0sT (right side) -> T * F: three new children appear.
- 10.8sMiddle T -> F and F -> n: leaf becomes 2.
- 12.1sRightmost F -> n: leaf becomes 3. Tree complete.
A term language is the set of all terms (trees) generated by a tree grammar (or by an SML datatype's constructors). Term languages are the natural objects of symbolic AI: every formula, every program, every expression is a term.
Definition 3.6. A term language over signature is the smallest set closed under the term constructors: , , and if is a -ary constructor and , then .
Example (G_arith). Arithmetic terms over variables and numbers:
term ::= var(varname)
| num(number)
| neg(term)
| sum(term, term)
| prod(term, term)
Onion principle. A term can be viewed 'onion-style': peel off the outermost constructor to get the direct subterms, then peel those recursively. This corresponds to structural recursion on the term.
Parse trees are term-language objects — they are the primary data structure of symbolic AI.