Chapter 7Formal Languagesnotes.en.pdf:119-121

Formal Languages

Kleene star, language operations

Def 7.1.7Kleene starDef 7.1.9Formal languageDef 7.1.13Language ops
Concept

A formal language is just a set of strings. That's it. Nothing more mysterious.
The whole point is to distinguish "good" strings (members of the language, like
well-formed programs) from "bad" strings (non-members, like syntax errors).

Kleene star (Definition 7.1.7, notes p. 119): for an alphabet Σ\Sigma:

Σ=iNΣi\Sigma^* = \bigcup_{i \in \mathbb{N}} \Sigma^i

This is the set of all strings over Σ\Sigma, including ϵ\epsilon. So
Σ={ϵ}ΣΣ2Σ3\Sigma^* = \{\epsilon\} \cup \Sigma \cup \Sigma^2 \cup \Sigma^3 \cup \dots.

Kleene plus: Σ+=Σ{ϵ}\Sigma^+ = \Sigma^* \setminus \{\epsilon\} — all nonempty strings.

Definition 7.1.9: A formal language over Σ\Sigma is a set LΣL \subseteq \Sigma^*.

Operations on languages (Definition 7.1.13):
- Union L1L2L_1 \cup L_2, intersection L1L2L_1 \cap L_2, complement ΣL\Sigma^* \setminus L.
- Concatenation L1L2={uwuL1,wL2}L_1 L_2 = \{uw \mid u \in L_1, w \in L_2\}.
- Power LnL^n: nn-fold concatenation of LL with itself (L0={ϵ}L^0 = \{\epsilon\}).
- Kleene closure L=n0LnL^* = \bigcup_{n \ge 0} L^n.

Important sanity check: a formal language may be infinite (e.g. all strings
starting with bb followed by any number of aa's) — but the alphabet is finite.
The infinite-ness comes from repetition.

Why this matters in AI:
- The set of valid programs in Python is a formal language over Unicode.
- The set of valid logical formulas in propositional logic is a formal language
over {p,q,r,¬,,,(,)}\{p, q, r, \neg, \land, \lor, (, )\}.
- The set of grammatical English sentences is (approximately) a formal language.

Language complement and reflection

Two more operations on languages over alphabet Σ\Sigma:

Complement (Def 1.7): L:=ΣL\overline{L} := \Sigma^* \setminus L. The set of all strings not in LL.

Reflection / Reversal (Def 1.8): LR:={wRwL}L^R := \{ w^R \mid w \in L \}, where wRw^R is ww reversed.

Kleene plus (Def 1.7): L+:=LL2L3L^+ := L \cup L^2 \cup L^3 \cup \cdots — one or more concatenations (vs. Kleene star Σ\Sigma^* which includes zero).

Examples over Σ={a,b}\Sigma = \{a, b\}:
- L={ab,ba}L = \{ab, ba\}. L=Σ{ab,ba}\overline{L} = \Sigma^* \setminus \{ab, ba\} — every string except ab and ba.
- LR={ba,ab}L^R = \{ba, ab\} — same set for this symmetric example.
- L+={ab,ba,abab,abba,baab,baba,}L^+ = \{ab, ba, abab, abba, baab, baba, \ldots\} — concatenations of length 1\geq 1.

Worked example
Step 0 of 2
Practice — score 100% to advance
Multiple choice
Q1
A formal language over Σ\Sigma is…
Q2
Σ\Sigma^* is the set of…
Q3
If L={a,bb}L = \{a, bb\}, then L2L^2 has how many elements?
Q4
L1={a}L_1 = \{a\} and L2={b}L_2 = \{b\}. What is L1L2L_1 L_2?
Q5
Why is a formal language allowed to be infinite?
Q6
If L={a}L = \{a\}, then L+=L^+ =
Fill in the blank
Q1
The reversal of abc is ___.
Match definitions
Match each concept on the left to its definition on the right.
Loading…