Chapter 7Formal Languagesnotes.en.pdf:140-141

Chomsky Hierarchy

Type 0 / 1 / 2 / 3: increasingly restrictive

Def 7.3.1Context-sensitive (type 1)Def 7.3.1Context-free (type 2)Def 7.3.1Regular (type 3)
Concept

Not all grammars are equally powerful. The Chomsky hierarchy (Noam Chomsky,
1965; Definition 7.3.1, notes p. 140) classifies grammars by the shape of their
production rules. Each level is more restrictive — and easier to parse.

The four levels, from least to most restrictive:

- Type 0Unrestricted (phrase-structure grammars). Heads contain at
least one nonterminal; bodies are arbitrary strings. Maximum power.

- Type 1Context-sensitive. Bodies have no fewer symbols than heads:
bh|b| \ge |h|. (Exception: SϵS \to \epsilon is allowed.) The rewrite depends on
the surrounding context.

- Type 2Context-free (CFG). Heads are a single nonterminal:
AbA \to b for ANA \in N. The rewrite does not depend on context. CFGs cover
most programming language syntax.

- Type 3Regular (REG). Heads are a single nonterminal; bodies are
empty or a single terminal, optionally followed by a single nonterminal.
Equivalent to finite-state machines. Minimum power; easiest to parse.

The hierarchy is strict: every regular language is context-free, every
context-free language is context-sensitive, every context-sensitive language is
unrestricted. Each containment is strict (there are languages at level ii that
are NOT at level i+1i+1).

Classic examples:
- {anbncnn0}\{a^n b^n c^n \mid n \ge 0\} — context-sensitive (Type 1). Cannot be CFG.
- {anbnn0}\{a^n b^n \mid n \ge 0\} — context-free (Type 2). Cannot be regular.
- {ann0}\{a^n \mid n \ge 0\} — regular (Type 3).

For SMAI and most AI applications, context-free grammars are the sweet spot:
expressive enough for most languages, parsing algorithms run in O(n3)O(n^3) (Theorem
7.3.14, notes p. 173).

Animation — chomsky hierarchy
Transcript — click a line to jump8 cues
  1. 0.0sChomsky Hierarchy
  2. 0.8sInnermost rectangle: Type 3, Regular; example (a|b)*.
  3. 1.9sType 2, Context-Free: example a^n b^n.
  4. 2.8sType 1, Context-Sensitive: example a^n b^n c^n.
  5. 3.7sType 0, Unrestricted: example L = a^n b^n c^n.
  6. 5.1sExamples stack as nested sets.
  7. 6.5sInclusions: T3 subset T2 subset T1 subset T0.
  8. 7.5sEach outer type strictly contains the inner ones.
{a^n b^n c^n} is context-sensitive

The language L={anbncnn1}L = \{a^n b^n c^n \mid n \geq 1\} is context-sensitive (Type 1) — it cannot be generated by a context-free grammar.

A context-sensitive grammar for LL:

S → abc | aABc A → aABc | abc cB → Bc bB → bb

The rule cB → Bc swaps c to the right of B — needed to gather all the cs at the end. This is the kind of cross-serial dependency that distinguishes context-sensitive from context-free.

Observation. Natural languages (English, German, ...) are believed to be context-sensitive in general. Crucially, humans parse them in real time — despite the theoretical worst-case complexity. This is the 'real-time parsing puzzle' and motivates incremental / psycholinguistic models of parsing.

{a^n} regular grammar

The language L={ann0}L = \{a^n \mid n \geq 0\} is regular. A Type-3 grammar:

S → aS | ε

The recursive rule S → aS produces any number of as; the base case S → ε (empty string) terminates. This is the simplest possible non-trivial regular language.

Worked example
Step 0 of 2
Practice — score 100% to advance
Multiple choice
Q1
Which is the most restrictive (smallest language class)?
Q2
A context-free grammar has rules of the form…
Q3
{anbncnn0}\{a^n b^n c^n \mid n \ge 0\} is…
Q4
Which type of grammar corresponds to finite-state machines?
Q5
Why are context-free grammars most useful in practice?
Q6
The Chomsky hierarchy is…
Q7
L={anbncnn1}L = \{a^n b^n c^n \mid n \geq 1\} is:
Fill in the blank
Q1
The rule S → aS | ε generates the language {ann\{a^n \mid n \geq }\}.
Order the steps
Arrange these proof steps in the correct order using the arrows.
1
Type 1: Context-sensitive — bodyhead|\text{body}| \ge |\text{head}|
2
Type 0: Unrestricted — any production rule with a nonterminal in the head
3
Type 3: Regular — body is ϵ\epsilon or a terminal, optional nonterminal
4
Type 2: Context-free — head is a single nonterminal
Loading…