finding substrings of a string
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finding substrings is one of the most common operations in text processing. Sometimes you only need to know whether a pattern exists; other times you need the first index, every occurrence, or an algorithm that scales well to large inputs.
Basic Substring Search
At the simplest level, substring search means checking whether one string appears contiguously inside another.
In Python, the most direct options are in, find, and index:
These tools answer slightly different questions:
- '
inreturns a boolean' - '
findreturns the index or-1' - '
indexreturns the index or raises an exception if not found'
For most application code, one of these is enough.
Finding Every Occurrence
If the substring may appear multiple times, loop through the string and continue the search from the previous match.
This prints [1, 3], which shows overlapping matches. Advancing by 1 rather than by len(pattern) is what allows overlaps to be found.
Slicing and Manual Comparison
Under the hood, a straightforward substring algorithm compares slices of the source string against the pattern.
This is easy to understand and useful for teaching, but it can become inefficient when the text is large and the comparisons repeat a lot of work.
When Algorithm Choice Matters
For ordinary business code, built-in substring search is usually the right answer because language runtimes are heavily optimized. But in algorithm-heavy workloads, classic pattern-matching strategies matter:
- Knuth-Morris-Pratt avoids rechecking known prefixes
- Boyer-Moore skips forward aggressively on mismatches
- Rabin-Karp uses hashing and is helpful in some multi-pattern scenarios
You do not need to reimplement these unless performance profiling shows substring search is the actual bottleneck. Most of the time, using the standard library is simpler and safer.
Regular Expressions for Pattern-Like Searches
If the thing you are looking for is not a fixed substring but a pattern, regular expressions are often a better fit.
This is different from plain substring search because the match can vary in content while still following a structured rule.
Common Pitfalls
- Using
indexwhen the substring may not exist causes an exception. Usefindif "not found" is a normal case. - Forgetting about overlapping matches misses valid results when patterns can share characters.
- Reaching for regular expressions when you only need a fixed substring adds complexity for no real benefit.
- Reimplementing advanced search algorithms too early usually makes code harder to maintain without measurable gain.
Case sensitivity is another easy source of bugs when user input and stored text are normalized differently.
Summary
- Use
in,find, orindexfor most fixed-substring searches. - Loop with a moving start position when you need every occurrence.
- Built-in search is usually faster and safer than custom code.
- Switch to regular expressions only when the search target is a real pattern rather than a fixed substring.

