Convert finite state machine to regular expression

finite state machine

regular expression

automata theory

conversion method

computer science

Convert finite state machine to regular expression

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Finite state machines (FSMs) and regular expressions (regex) are two equivalent ways to describe regular languages. An FSM processes input character by character through states and transitions, while a regex describes the same language as an algebraic pattern. Converting an FSM to its equivalent regex is a foundational technique in automata theory, used in compiler design, lexical analysis, and text processing. This article walks through the state elimination method, the most practical conversion algorithm.

Finite State Machine Basics

A finite state machine is defined by five components:

States $Q$ : A finite set of states.
Alphabet $\Sigma$ : A finite set of input symbols.
Transition function $\delta$ : Maps (state, symbol) pairs to next states.
Start state $q_0$ : The state where processing begins.
Accept states $F$ : States that indicate a valid input sequence.

FSMs come in two flavors:

DFA (Deterministic Finite Automaton): Exactly one transition per symbol per state.
NFA (Nondeterministic Finite Automaton): Zero, one, or many transitions per symbol, including epsilon transitions (transitions without consuming input).

Both have the same expressive power as regular expressions.

Regular Expression Basics

Regular expressions use a small set of operators to describe patterns:

Literals: Individual characters like a or 1.
Concatenation: ab matches "a" followed by "b".
Union (alternation): $a | b$ matches "a" or "b".
Kleene star: $a^*$ matches zero or more occurrences of "a".
Parentheses: Group sub-expressions, e.g., $(ab)^*$ matches zero or more occurrences of "ab".

The State Elimination Method

State elimination is the most intuitive and widely taught method for converting an FSM to a regex. The idea is to remove states one at a time, replacing the transitions through each removed state with regex-labeled transitions between the remaining states.

Preparation

If the FSM has multiple accept states, add a new single accept state $q_f$ and connect each original accept state to $q_f$ with an epsilon transition.
If the start state has incoming transitions, add a new start state $q_s$ with an epsilon transition to the original start state.

After preparation, the FSM has exactly one start state (with no incoming edges) and one accept state (with no outgoing edges).

Elimination Procedure

For each state $q_k$ to be eliminated (not $q_s$ or $q_f$ ):

For every pair of states $(q_i, q_j)$ where $q_i$ has an edge to $q_k$ labeled $R_1$ and $q_k$ has an edge to $q_j$ labeled $R_3$ , and $q_k$ has a self-loop labeled $R_2$ :
Add (or update) an edge from $q_i$ to $q_j$ with the regex:
$R_1 \cdot R_2^* \cdot R_3$
If there is already an edge from $q_i$ to $q_j$ labeled $R_4$ , the new label becomes:
$R_4 \;|\; R_1 \cdot R_2^* \cdot R_3$
Remove $q_k$ and all its edges.
Repeat until only $q_s$ and $q_f$ remain.

The label on the single remaining edge from $q_s$ to $q_f$ is the regex for the language.

Worked Example

Consider an NFA with:

States: ${q_0, q_1, q_2}$
Alphabet: ${a, b}$
Start state: $q_0$
Accept state: $q_2$
Transitions:
- $\delta(q_0, a) = q_0$ (self-loop)
- $\delta(q_0, b) = q_1$
- $\delta(q_1, a) = q_2$

Step 1: Preparation

The start state $q_0$ has a self-loop (incoming edge), so add a new start state $q_s$ with an epsilon edge to $q_0$ . The accept state $q_2$ has no outgoing edges, so add a new accept state $q_f$ with an epsilon edge from $q_2$ .

Step 2: Eliminate $q_1$

State $q_1$ has:

Incoming edge from $q_0$ labeled $b$
Outgoing edge to $q_2$ labeled $a$
No self-loop

The new edge from $q_0$ to $q_2$ is labeled $ba$ (since there is no self-loop, $R_2^* = \epsilon$ ).

Remove $q_1$ .

Step 3: Eliminate $q_0$

State $q_0$ has:

Incoming edge from $q_s$ labeled $\epsilon$
Self-loop labeled $a$
Outgoing edge to $q_2$ labeled $ba$

The new edge from $q_s$ to $q_2$ is:

$\epsilon \cdot a^* \cdot ba = a^*ba$

Step 4: Eliminate $q_2$

State $q_2$ has:

Incoming edge from $q_s$ labeled $a^*ba$
Outgoing edge to $q_f$ labeled $\epsilon$
No self-loop

The final edge from $q_s$ to $q_f$ is $a^*ba$ .

Result: The equivalent regular expression is $a^*ba$ .

Summary Table

Step	Description
Preparation	Ensure unique start/accept states with no conflicting edges
State Elimination	Remove states one by one, merging transitions into regex labels
Transition Update	New label: $R_1 \cdot R_2^* \cdot R_3$ , union with existing edges
Final Result	The label on the sole remaining edge is the regex

Complexity and Practical Notes

The size of the resulting regex can be exponential in the number of states. Different elimination orders produce different (but equivalent) regex expressions, and some orders yield simpler results than others. In practice:

Eliminate states with the fewest connections first to keep intermediate expressions small.
Simplify regex expressions after each elimination step (e.g., $\epsilon | a = a?$ , $a | a = a$ ).
For DFAs with many states, consider Brzozowski's algebraic method or Arden's Lemma as alternative approaches.

Common Pitfalls

Forgetting to handle the self-loop on the eliminated state. Without $R_2^*$ , paths that loop through the state are lost.
Not merging with existing edges (using union) when a direct edge already exists between $q_i$ and $q_j$ .
Skipping the preparation step. If the start state has incoming edges or there are multiple accept states, the algorithm produces incorrect results.

Conclusion

Converting a finite state machine to a regular expression using state elimination is a systematic, mechanical process. Each state removal translates a set of transitions into a regex fragment, and the final result captures the entire language of the FSM. This equivalence between FSMs and regex is one of the foundational results of computation theory, proving that both formalisms have exactly the same expressive power.

Convert finite state machine to regular expression

Master System Design with Codemia

Introduction

Finite State Machine Basics

Regular Expression Basics

The State Elimination Method

Preparation

Elimination Procedure

Worked Example

Step 1: Preparation

Step 2: Eliminate q1q_1q1​

Step 3: Eliminate q0q_0q0​

Step 4: Eliminate q2q_2q2​

Summary Table

Complexity and Practical Notes

Common Pitfalls

Conclusion

Step 2: Eliminate $q_1$

Step 3: Eliminate $q_0$

Step 4: Eliminate $q_2$