Phrase-Structure Grammars
4-tuple ⟨N, Σ, P, S⟩ that generates a language
Phrase-structure grammars (PSG) are the most general kind of grammar — the type 0 grammars of the Chomsky hierarchy. Per the slides Chapter 8 (Def 1.1, p.347), a PSG is a 4-tuple where:
- — a finite set of nonterminal symbols (placeholders that get rewritten).
- — a finite set of terminal symbols (the actual characters of the language). .
- — a finite set of production rules of the form (head → body).
- — a distinguished start symbol.
The language generated by is the set of all terminal strings derivable from .
But to talk rigorously about any mathematical object we need more than a pile of sets — we need a way to bundle them together. The slides introduce record data structures (Def 1.2, p.347) for this: container types with named components, accessed via dot notation. C structs, OOP classes, and SML records are all examples. A PSG is naturally a record with four named fields.
The slides go further with structure signatures (Def 1.5, p.349): a tabular overview listing each component, its type, and an accessor name. We will use signatures throughout Chapter 5 to keep track of what each structure contains.
- 0.0sA PSG as a record
- 1.0sCyan rounded box draws with header: Phrase-Structure Grammar
- 2.0sThree divider lines split the box into four fields
- 2.8sFields fade in: N, Sigma, P, S
- 4.5sAmber arrow points at N: non-terminals
- 5.3sArrow moves to Sigma: terminal alphabet
- 6.1sArrow moves to P: production rules
- 6.9sArrow moves to S: start symbol
- 8.3sA PSG is a 4-tuple N Sigma P S, as code it's a struct or class