Convert finite state machine to regular expression
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finite state machines (FSMs) and regular expressions (regex) are two equivalent ways to describe regular languages. An FSM processes input character by character through states and transitions, while a regex describes the same language as an algebraic pattern. Converting an FSM to its equivalent regex is a foundational technique in automata theory, used in compiler design, lexical analysis, and text processing. This article walks through the state elimination method, the most practical conversion algorithm.
Finite State Machine Basics
A finite state machine is defined by five components:
- States : A finite set of states.
- Alphabet : A finite set of input symbols.
- Transition function : Maps (state, symbol) pairs to next states.
- Start state : The state where processing begins.
- Accept states : States that indicate a valid input sequence.
FSMs come in two flavors:
- DFA (Deterministic Finite Automaton): Exactly one transition per symbol per state.
- NFA (Nondeterministic Finite Automaton): Zero, one, or many transitions per symbol, including epsilon transitions (transitions without consuming input).
Both have the same expressive power as regular expressions.
Regular Expression Basics
Regular expressions use a small set of operators to describe patterns:
- Literals: Individual characters like
aor1. - Concatenation:
abmatches "a" followed by "b". - Union (alternation): matches "a" or "b".
- Kleene star: matches zero or more occurrences of "a".
- Parentheses: Group sub-expressions, e.g., matches zero or more occurrences of "ab".
The State Elimination Method
State elimination is the most intuitive and widely taught method for converting an FSM to a regex. The idea is to remove states one at a time, replacing the transitions through each removed state with regex-labeled transitions between the remaining states.
Preparation
- If the FSM has multiple accept states, add a new single accept state and connect each original accept state to with an epsilon transition.
- If the start state has incoming transitions, add a new start state with an epsilon transition to the original start state.
After preparation, the FSM has exactly one start state (with no incoming edges) and one accept state (with no outgoing edges).
Elimination Procedure
For each state to be eliminated (not or ):
- For every pair of states where has an edge to labeled and has an edge to labeled , and has a self-loop labeled :Add (or update) an edge from to with the regex:If there is already an edge from to labeled , the new label becomes:
- Remove and all its edges.
- Repeat until only and remain.
The label on the single remaining edge from to is the regex for the language.
Worked Example
Consider an NFA with:
- States:
- Alphabet:
- Start state:
- Accept state:
- Transitions:
- (self-loop)
Step 1: Preparation
The start state has a self-loop (incoming edge), so add a new start state with an epsilon edge to . The accept state has no outgoing edges, so add a new accept state with an epsilon edge from .
Step 2: Eliminate
State has:
- Incoming edge from labeled
- Outgoing edge to labeled
- No self-loop
The new edge from to is labeled (since there is no self-loop, ).
Remove .
Step 3: Eliminate
State has:
- Incoming edge from labeled
- Self-loop labeled
- Outgoing edge to labeled
The new edge from to is:
Step 4: Eliminate
State has:
- Incoming edge from labeled
- Outgoing edge to labeled
- No self-loop
The final edge from to is .
Result: The equivalent regular expression is .
Summary Table
| Step | Description |
| Preparation | Ensure unique start/accept states with no conflicting edges |
| State Elimination | Remove states one by one, merging transitions into regex labels |
| Transition Update | New label: , union with existing edges |
| Final Result | The label on the sole remaining edge is the regex |
Complexity and Practical Notes
The size of the resulting regex can be exponential in the number of states. Different elimination orders produce different (but equivalent) regex expressions, and some orders yield simpler results than others. In practice:
- Eliminate states with the fewest connections first to keep intermediate expressions small.
- Simplify regex expressions after each elimination step (e.g., , ).
- For DFAs with many states, consider Brzozowski's algebraic method or Arden's Lemma as alternative approaches.
Common Pitfalls
- Forgetting to handle the self-loop on the eliminated state. Without , paths that loop through the state are lost.
- Not merging with existing edges (using union) when a direct edge already exists between and .
- Skipping the preparation step. If the start state has incoming edges or there are multiple accept states, the algorithm produces incorrect results.
Conclusion
Converting a finite state machine to a regular expression using state elimination is a systematic, mechanical process. Each state removal translates a set of transitions into a regex fragment, and the final result captures the entire language of the FSM. This equivalence between FSMs and regex is one of the foundational results of computation theory, proving that both formalisms have exactly the same expressive power.

