finite state machine
regular expression
automata theory
conversion method
computer science

Convert finite state machine to regular expression

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finite state machines (FSMs) and regular expressions (regex) are two equivalent ways to describe regular languages. An FSM processes input character by character through states and transitions, while a regex describes the same language as an algebraic pattern. Converting an FSM to its equivalent regex is a foundational technique in automata theory, used in compiler design, lexical analysis, and text processing. This article walks through the state elimination method, the most practical conversion algorithm.

Finite State Machine Basics

A finite state machine is defined by five components:

  • States QQ: A finite set of states.
  • Alphabet Σ\Sigma: A finite set of input symbols.
  • Transition function δ\delta: Maps (state, symbol) pairs to next states.
  • Start state q0q_0: The state where processing begins.
  • Accept states FF: States that indicate a valid input sequence.

FSMs come in two flavors:

  • DFA (Deterministic Finite Automaton): Exactly one transition per symbol per state.
  • NFA (Nondeterministic Finite Automaton): Zero, one, or many transitions per symbol, including epsilon transitions (transitions without consuming input).

Both have the same expressive power as regular expressions.

Regular Expression Basics

Regular expressions use a small set of operators to describe patterns:

  • Literals: Individual characters like a or 1.
  • Concatenation: ab matches "a" followed by "b".
  • Union (alternation): aba | b matches "a" or "b".
  • Kleene star: aa^* matches zero or more occurrences of "a".
  • Parentheses: Group sub-expressions, e.g., (ab)(ab)^* matches zero or more occurrences of "ab".

The State Elimination Method

State elimination is the most intuitive and widely taught method for converting an FSM to a regex. The idea is to remove states one at a time, replacing the transitions through each removed state with regex-labeled transitions between the remaining states.

Preparation

  1. If the FSM has multiple accept states, add a new single accept state qfq_f and connect each original accept state to qfq_f with an epsilon transition.
  2. If the start state has incoming transitions, add a new start state qsq_s with an epsilon transition to the original start state.

After preparation, the FSM has exactly one start state (with no incoming edges) and one accept state (with no outgoing edges).

Elimination Procedure

For each state qkq_k to be eliminated (not qsq_s or qfq_f):

  1. For every pair of states (qi,qj)(q_i, q_j) where qiq_i has an edge to qkq_k labeled R1R_1 and qkq_k has an edge to qjq_j labeled R3R_3, and qkq_k has a self-loop labeled R2R_2:
    Add (or update) an edge from qiq_i to qjq_j with the regex:
    R1R2R3R_1 \cdot R_2^* \cdot R_3
    If there is already an edge from qiq_i to qjq_j labeled R4R_4, the new label becomes:
    R4    R1R2R3R_4 \;|\; R_1 \cdot R_2^* \cdot R_3
  2. Remove qkq_k and all its edges.
  3. Repeat until only qsq_s and qfq_f remain.

The label on the single remaining edge from qsq_s to qfq_f is the regex for the language.

Worked Example

Consider an NFA with:

  • States: q0,q1,q2{q_0, q_1, q_2}
  • Alphabet: a,b{a, b}
  • Start state: q0q_0
  • Accept state: q2q_2
  • Transitions:
    • δ(q0,a)=q0\delta(q_0, a) = q_0 (self-loop)
    • δ(q0,b)=q1\delta(q_0, b) = q_1
    • δ(q1,a)=q2\delta(q_1, a) = q_2

Step 1: Preparation

The start state q0q_0 has a self-loop (incoming edge), so add a new start state qsq_s with an epsilon edge to q0q_0. The accept state q2q_2 has no outgoing edges, so add a new accept state qfq_f with an epsilon edge from q2q_2.

Step 2: Eliminate q1q_1

State q1q_1 has:

  • Incoming edge from q0q_0 labeled bb
  • Outgoing edge to q2q_2 labeled aa
  • No self-loop

The new edge from q0q_0 to q2q_2 is labeled baba (since there is no self-loop, R2=ϵR_2^* = \epsilon).

Remove q1q_1.

Step 3: Eliminate q0q_0

State q0q_0 has:

  • Incoming edge from qsq_s labeled ϵ\epsilon
  • Self-loop labeled aa
  • Outgoing edge to q2q_2 labeled baba

The new edge from qsq_s to q2q_2 is:

ϵaba=aba\epsilon \cdot a^* \cdot ba = a^*ba

Step 4: Eliminate q2q_2

State q2q_2 has:

  • Incoming edge from qsq_s labeled abaa^*ba
  • Outgoing edge to qfq_f labeled ϵ\epsilon
  • No self-loop

The final edge from qsq_s to qfq_f is abaa^*ba.

Result: The equivalent regular expression is abaa^*ba.

Summary Table

StepDescription
PreparationEnsure unique start/accept states with no conflicting edges
State EliminationRemove states one by one, merging transitions into regex labels
Transition UpdateNew label: R1R2R3R_1 \cdot R_2^* \cdot R_3, union with existing edges
Final ResultThe label on the sole remaining edge is the regex

Complexity and Practical Notes

The size of the resulting regex can be exponential in the number of states. Different elimination orders produce different (but equivalent) regex expressions, and some orders yield simpler results than others. In practice:

  • Eliminate states with the fewest connections first to keep intermediate expressions small.
  • Simplify regex expressions after each elimination step (e.g., ϵa=a?\epsilon | a = a?, aa=aa | a = a).
  • For DFAs with many states, consider Brzozowski's algebraic method or Arden's Lemma as alternative approaches.

Common Pitfalls

  • Forgetting to handle the self-loop on the eliminated state. Without R2R_2^*, paths that loop through the state are lost.
  • Not merging with existing edges (using union) when a direct edge already exists between qiq_i and qjq_j.
  • Skipping the preparation step. If the start state has incoming edges or there are multiple accept states, the algorithm produces incorrect results.

Conclusion

Converting a finite state machine to a regular expression using state elimination is a systematic, mechanical process. Each state removal translates a set of transitions into a regex fragment, and the final result captures the entire language of the FSM. This equivalence between FSMs and regex is one of the foundational results of computation theory, proving that both formalisms have exactly the same expressive power.


Course illustration
Course illustration

All Rights Reserved.