Chapter 7Formal Languagesnotes.en.pdf:142-143

Top-down and Bottom-up Parsing

Two strategies, one tree

Def 7.3.11 — Top-down parsingDef 7.3.12 — Bottom-up parsing

Concept

Parsing is the inverse of derivation: given a string, find a parse tree (or
prove none exists). There are two main strategies.

Top-down parsing (Definition 7.3.11, notes p. 142): start from the start
symbol $S$ and work downwards to terminals. Try each rule that has $S$ on the
left; for each, recursively try to derive the next nonterminal. Predictive:
you guess what each nonterminal should expand to based on the input.

Bottom-up parsing (Definition 7.3.12, notes p. 143): start from the input
string and work upwards to $S$ . Find substrings matching the right-hand side of
some rule, and replace them with the left-hand nonterminal. This is called
shift-reduce parsing.

A subtle distinction (notes p. 172):
- LL parsers are top-down (Left-to-right scan, Leftmost derivation).
- LL(1): one token of lookahead.
- LL(k): $k$ tokens of lookahead.
- LR parsers are bottom-up (Left-to-right scan, Rightmost derivation in
reverse).
- LR(0), SLR, LALR, LR(1): increasing power, increasing table size.

In practice:
- Top-down is conceptually simpler, fits recursive-descent parsers well.
- Bottom-up is more powerful (handles a wider class of grammars) and tends
to be more efficient. Yacc / Bison are bottom-up LR parser generators.

Theorem 7.3.14 (notes p. 173, Lee 2002): context-free parsing is equivalent
to Boolean matrix multiplication. This is why efficient CFG parsers run in
$O(n^3)$ — and why we are unlikely to find much faster practical algorithms.

Animation — parsing table

Transcript — click a line to jump8 cues

0.0sPredictive parsing: LL(1)
1.2sGrammar E → T + E | T appears at the top
3.0sParse table builds — rows for non-terminals, columns for terminals
5.5sEach cell holds the production to apply for that lookahead
6.5sInput string + n is queued; cursor lands on +
7.5sCursor + is looked up in row E, column + → production E → T + E
9.0sCursor moves to n; lookup in row T, column n → T → n
11.5sLookahead + table entry + production = next derivation step

Worked example

Step 0 of 2

Practice — score 100% to advance

Multiple choice

Top-down parsing starts from…

Bottom-up parsing replaces substrings matching…

What does LL(1) mean?

Why are bottom-up parsers often preferred?

Context-free parsing has the same complexity as…

Order the steps

Arrange these proof steps in the correct order using the arrows.

Bottom-up: replace body with the rule's head

Top-down: expand leftmost nonterminal using a rule

Top-down: begin with start symbol

S

Top-down: continue until string is all terminals and matches input

Bottom-up: find substring matching a rule body

Bottom-up: repeat until only

S

remains

Loading…