Context-Free Grammar Theory
A Context-Free Grammar (CFG) is a formal system used to generate all possible strings in a language. While powerful, raw grammars often contain structural overhead that hinders parsing performance.
01. Core Foundations
The specialized components that make up any formal grammar.
Variables V
Recursive building blocks (e.g., S, A, B) that act as placeholders for larger structures.
Terminals Σ
The "leaf" nodes (e.g., a, b, 0, 1) that cannot be further expanded. These form the actual strings.
Productions P
Substitution rules defined as LHS → RHS. The LHS must be a single Variable.
02. The Simplification Pipeline
Correct order matters. We must remove nulls and units before pruning useless symbols.
Elimination of ε-Rules
Removes rules that derive the empty string (A → ε).
If A → αBβ and B is Nullable : Add A → αβ
Procedure: Find all Nullable variables via fixed-point iteration and generate all valid combinations for each rule.
Elimination of Unit Rules
Removes "alias" transitions like A → B.
If A → B and B → γ : Add A → γ directly.
Procedure: Compute the full closure for each variable and substitute dependencies directly via non-unit rules.
Useless Symbols Pruning
Deletes symbols that never produce terminals or are unreachable from the Start symbol.
R = { Variables reachable from the Start symbol }
Procedure: Strict two-pass sweep isolating the valid mathematical subset.
03. Visual Case Study
Observe how a complex grammar is synthesized into its most efficient form.
| Category | Original Rule | Transformation Logic | Simplified Result |
|---|---|---|---|
| Nulls | A → a | ε |
Propagate ε to all parents |
A → a |
| Units | S → A |
Substitute A's RHS into S |
S → a |
| Useless | B → b |
If B is never reached from S |
(Removed) |
Why do we simplify?
Compiler Efficiency
Reduces the depth of Abstract Syntax Trees (ASTs), leading to faster semantic analysis and code generation.
Language Design
Ensures that the defined language is clean, non-ambiguous, and easier for humans and machines to parse.
Normalization Forms
A required step for Chomsky Normal Form conversion, enabling algorithms like CYK to work in O(n³) time.
Define Your Context-Free Grammar
Provide your rules using the text area or the dynamic rule builder. Use
? or '' for empty productions.
Text Editor
Dynamic Rule Builder
| or slash/[Lower Case characters will automatically be converted
to Upper Case in the LHS].
For people familiar with Context-Free Grammars (CFG)
For those who want to learn the entire process
Pipeline Execution Trace
Observe the transition from complexity to strict efficiency.
Stage 1: Null Productions Removal
Identifies nullable non-terminals (deriving ε) and dynamically
generates alternative productions.
Algorithm: Remove ε-Productions
1. Compute Nullable Set (N):
- Add variables that directly derive ε.
- Iteratively add variables that derive strings entirely composed of symbols in N.
2. For each rule containing symbols in N:
- Generate combinations replacing each occurrence with ε.
- Add newly generated rules explicitly.
- Remove all original ε-productions.
Algorithmic Traces
Dependency Graph
Before Stage 1
After Stage 1
Stage 2: Unit Productions Removal
Eliminates A → B transitions by determining derivation closures and
making direct substitutions.
Algorithm: Remove Unit Productions
1. Compute Derivation Closure D(A) for every variable:
- Initialize D(A) = {A}.
- If B is in D(A) and there is a unit rule B → C, add C to D(A).
2. For each variable A:
- For every variable B in D(A), map all non-unit rules of B to A.
- Remove the original unit rules (A → B).
Algorithmic Traces
Dependency Graph
Before Stage 2
After Stage 2
Stage 3: Useless Symbols Removal
Eliminates non-productive symbols and symbols unreachable from the start state.
Algorithm: Remove Useless Symbols
1. Compute Generating Set (G) [Productive]:
- Add all variables that directly derive strings of terminals.
- Iteratively add variables that derive strings of symbols currently in G.
2. Compute Reachable Set (R):
- Start with R = {S}.
- Iteratively add any variable that appears on RHS of rules mapped from variables in R.
- Remove any rules for variables not in R.
Algorithmic Traces
Dependency Graph
Before Stage 3
After Stage 3
Stage 4: Validation & Summary
A rigorous double-pass safety check. The pipeline is run a second time to ensure the grammar has converged to absolute stability.
Status: Validation Pass Log
The algorithms (Null → Unit → Useless) are applied again to the 'Final' state.
If the resulting grammar matches structurally, the CFG is mathematically proven to be
minimal and stable. A status banner will appear below to confirm.