string manipulation
programming
index searching
coding techniques
string occurrence

Get the index of the nth occurrence of a string?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finding the index of the nth occurrence of a substring is a common text-processing task in parsing, log analysis, and validation code. The main challenge is handling edge cases such as overlapping matches and missing occurrences. A reliable solution should define those rules up front and keep complexity clear.

Core Sections

Define What nth occurrence Means

Before writing code, decide whether matches can overlap. For example, in aaaa, the substring aa appears at indices 0, 1, and 2 if overlap is allowed, but only at 0 and 2 in non-overlapping mode. Most beginner implementations ignore this distinction and return inconsistent results.

Non-overlapping Search with str.find

The fastest practical baseline is repeated find with a moving start index.

python
1def nth_index(text: str, needle: str, n: int) -> int:
2    if n <= 0:
3        raise ValueError("n must be >= 1")
4    if needle == "":
5        raise ValueError("needle must not be empty")
6
7    start = 0
8    for _ in range(n):
9        idx = text.find(needle, start)
10        if idx == -1:
11            return -1
12        start = idx + len(needle)
13    return idx
14
15print(nth_index("one two two three two", "two", 2))
16print(nth_index("abc", "x", 1))

This method is easy to read and efficient for normal workloads.

Overlapping Search Variant

If overlap is required, advance by one character instead of len(needle).

python
1def nth_index_overlapping(text: str, needle: str, n: int) -> int:
2    if n <= 0:
3        raise ValueError("n must be >= 1")
4    if needle == "":
5        raise ValueError("needle must not be empty")
6
7    start = 0
8    for _ in range(n):
9        idx = text.find(needle, start)
10        if idx == -1:
11            return -1
12        start = idx + 1
13    return idx
14
15print(nth_index_overlapping("aaaa", "aa", 2))

Explicit overlap behavior prevents ambiguity in downstream logic.

Regex Option for Pattern-based Matches

If matching rules are more complex than plain substring search, use regular expressions. Regex also supports lookahead-based overlapping matches.

python
1import re
2
3def nth_regex_index(text: str, pattern: str, n: int) -> int:
4    if n <= 0:
5        raise ValueError("n must be >= 1")
6
7    matches = list(re.finditer(pattern, text))
8    if len(matches) < n:
9        return -1
10    return matches[n - 1].start()
11
12print(nth_regex_index("ID-12 ID-34 ID-56", r"ID-\d+", 3))

For very large text, avoid materializing all matches when you only need one target occurrence.

Streaming and Large-file Considerations

When processing large files, reading all content into memory may be expensive. Stream line by line, track cumulative offset, and stop as soon as the nth occurrence is found.

python
1def nth_in_lines(lines, needle, n):
2    count = 0
3    offset = 0
4    for line in lines:
5        pos = 0
6        while True:
7            i = line.find(needle, pos)
8            if i == -1:
9                break
10            count += 1
11            if count == n:
12                return offset + i
13            pos = i + len(needle)
14        offset += len(line)
15    return -1

This approach scales better for long logs and batch pipelines.

Testing Strategy

Write tests for missing results, first and last occurrences, overlap behavior, and invalid inputs. Keep tests explicit so behavior stays stable during refactoring.

python
assert nth_index("ababab", "ab", 3) == 4
assert nth_index("abab", "ab", 3) == -1
assert nth_index_overlapping("aaaa", "aa", 3) == 2

API Design for Reusable Text Utilities

When this logic is reused across services, package it as a utility with explicit options such as overlapping mode and not-found behavior. Returning -1 is common, but some codebases prefer None or raised exceptions for missing occurrences. Pick one style and document it clearly.

python
1def nth_occurrence(text, needle, n, overlap=False):
2    step_mode = 1 if overlap else len(needle)
3    if n <= 0 or needle == "":
4        raise ValueError("invalid input")
5
6    start = 0
7    idx = -1
8    for _ in range(n):
9        idx = text.find(needle, start)
10        if idx == -1:
11            return -1
12        start = idx + step_mode
13    return idx
14
15print(nth_occurrence("aaaa", "aa", 2, overlap=False))
16print(nth_occurrence("aaaa", "aa", 2, overlap=True))

This utility form makes call sites easier to read and keeps edge-case behavior centralized.

Common Pitfalls

  • Not defining overlap behavior and returning inconsistent indexes.
  • Accepting n equals zero silently instead of validating input.
  • Forgetting to guard against empty substring searches.
  • Building regex-only solutions for simple exact matches and adding unnecessary complexity.
  • Materializing all matches when only one target occurrence is needed.

Summary

  • Clarify overlap rules before implementation.
  • Use repeated find for straightforward non-overlapping searches.
  • Use one-step advancement for overlapping match logic.
  • Use regex when pattern matching is genuinely needed.
  • Add edge-case tests to keep indexing behavior predictable.

Course illustration
Course illustration

All Rights Reserved.