Phrase-Structure Grammars
⟨N, Σ, P, S⟩ — the workhorse of syntax
A phrase-structure grammar (PSG) is a finite recipe that can describe an
infinite language. It is the workhorse of syntax in formal language theory.
Definition 7.2.1 (notes p. 134): A PSG is a 4-tuple
where:
- — finite set of non-terminal symbols (placeholders, "variables").
- — finite set of terminal symbols (the actual alphabet of the language); .
- — finite set of production rules (or ), where
(head, contains at least one
nonterminal) and (body).
- — the start symbol (also sentence symbol).
A production rule says "rewrite as ". We may write
as a shorthand for rules with the
same head.
Derivation (Definition 7.2.2): derives in one step, written
, if there is a production rule and strings
such that and . We write for the
reflexive-transitive closure (zero or more steps).
Sentential form (Definition 7.2.3): any reachable from via .
A sentence is a sentential form with no nonterminals — i.e., a string in
.
Language generated (Definition 7.2.4): is the set of all sentences of .
Two grammars are equivalent (Definition 7.2.5) iff they generate the same
language.
- 0.0sGrammar Derivation Steps
- 0.8sGrammar: E -> E + E | n.
- 2.1sStart with the single nonterminal E.
- 3.0sApply E -> E + E; we now have E + E.
- 4.1sApply E -> n on the left; we have n + E.
- 5.2sApply E -> n on the right; we have n + n.
- 6.3sHighlight n + n: this string is in the language.
A PSG with rule set is often split into two parts:
- Lexical rules (Def 2.10): produce individual words / tokens / lexemes from characters. Example:
Det → the | a | an. - Structural / syntactic rules (Def 2.11): combine words into phrases. Example:
NP → Det N | Pron.
A lexicon (or vocabulary, Def 2.11) is the set of terminal strings produced by the lexical rules — i.e. the 'words' of the language. The syntactic categories (Def 2.12) are the nonterminals of the structural rules (e.g. NP, VP, S).
This split mirrors how real NLP systems work: a lexer/tokeniser first chops the input into tokens, then a parser builds the phrase structure.
Definition 2.7. Parsing is syntax analysis — given a string and a grammar , decide whether and produce a parse tree if so.
Parsing is the inverse of derivation: derivation goes from outward to a string; parsing goes from a string back to (and the tree of how).