Chapter 7Formal Languagesnotes.en.pdf:141-142

Parse Trees

Trees that show how a string is derived

Def 7.3.7Parse treeDef 7.3.8Derivation tree
Concept

A derivation is a sequence of strings. A parse tree is the same information
shown as a tree. Trees are far easier to read and use than linear derivations.

Definition 7.3.7 (notes p. 141): Given a CFG GG accepting string ss, a
parse tree is a node-labeled, ordered tree representing the syntactic
structure of ss.

Definition 7.3.8: For a derivation D:SGtD : S \Rightarrow^*_G t (where tt is a
sentence), the parse tree PDP_D is built recursively:
- If the last step is a lexical rule StS \to t (single terminal), then PDP_D is
a one-node tree with root labeled SS and a single child labeled tt.
- If DD factors as SGsGptS \Rightarrow^*_G s \Rightarrow^p_G t via rule ha1anh \to a_1 \dots a_n, the root of PDP_D is labeled hh, with one child for each
aia_i, where the ii-th child is the parse tree of the subderivation for aia_i.

Properties of parse trees:
- The root is always labeled with the start symbol SS.
- The leaves, read left to right, give the yield — the original string.
- Internal nodes are nonterminals; their children form the right-hand side of
the production used.

Ambiguity: a CFG is ambiguous if some string has multiple parse trees
(multiple derivations). Many grammars can be rewritten to remove ambiguity (but
not always — some CFLs are inherently ambiguous).

Onion principle: derivation peels off one layer of structure at a time,
descending to terminals. The parse tree makes this layering explicit.

Animation — parse tree build
Transcript — click a line to jump10 cues
  1. 0.0sParse Tree for: 1 + 2 * 3
  2. 0.8sGrammar on the left: E, T, F with sum and product levels.
  3. 2.1sRoot node E appears at the top.
  4. 2.7sApply E -> E + T: expand to E, +, T.
  5. 4.8sLeftmost E -> T: the left child becomes T.
  6. 6.2sThen T -> F: that T becomes F.
  7. 7.6sF -> n: the leftmost leaf is now 1.
  8. 9.0sT (right side) -> T * F: three new children appear.
  9. 10.8sMiddle T -> F and F -> n: leaf becomes 2.
  10. 12.1sRightmost F -> n: leaf becomes 3. Tree complete.
Term languages

A term language is the set of all terms (trees) generated by a tree grammar (or by an SML datatype's constructors). Term languages are the natural objects of symbolic AI: every formula, every program, every expression is a term.

Definition 3.6. A term language over signature Σ\Sigma is the smallest set TT closed under the term constructors: var(x)T\text{var}(x) \in T, const(c)T\text{const}(c) \in T, and if ff is a kk-ary constructor and t1,,tkTt_1, \ldots, t_k \in T, then f(t1,,tk)Tf(t_1, \ldots, t_k) \in T.

Example (G_arith). Arithmetic terms over variables and numbers:

term ::= var(varname) | num(number) | neg(term) | sum(term, term) | prod(term, term)

Onion principle. A term can be viewed 'onion-style': peel off the outermost constructor to get the direct subterms, then peel those recursively. This corresponds to structural recursion on the term.

Parse trees are term-language objects — they are the primary data structure of symbolic AI.

Worked example
Step 0 of 2
Practice — score 100% to advance
Multiple choice
Q1
The root of a parse tree is labeled with…
Q2
The yield of a parse tree is the…
Q3
When is a CFG called ambiguous?
Q4
Internal nodes of a parse tree are labeled with…
Q5
The 'onion principle' refers to…
Q6
Why are parse trees preferred over raw derivations?
Q7
In the onion view of a term, you start by:
Fill in the blank
Q1
The language of all arithmetic expressions built from var/num/neg/sum/prod is a ___ language.
Order the steps
Arrange these proof steps in the correct order using the arrows.
1
Recurse on each nonterminal child until all children are terminals
2
Read leaves left-to-right; verify yield equals the input string
3
Draw root labeled with start symbol SS
4
Apply a production rule to SS — add children for each right-hand symbol
Loading…