Finding multiple entries with binary search

binary search

algorithm

computer science

data structures

searching techniques

Finding multiple entries with binary search

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Standard binary search tells you whether a value exists in a sorted array and often returns one matching index. If duplicates are allowed, that is not enough. To find all matching entries efficiently, the usual approach is to run binary search twice: once for the left boundary and once for the right boundary.

Why One Binary Search Is Not Enough

Suppose the sorted array is:

text

[1, 2, 4, 4, 4, 7, 9]

A normal binary search for 4 may return index 3, but that does not tell you whether matching values also appear at indices 2 and 4.

So the real problem becomes:

find the first occurrence
find the last occurrence
optionally compute the count or return the slice of matching indexes

That keeps the time complexity logarithmic for the search phase instead of scanning the whole array linearly.

Find the Leftmost Match

The left boundary search is a slightly modified binary search. When you find the target, you keep searching left instead of stopping immediately.

python

1def find_first(nums, target):
2    left, right = 0, len(nums) - 1
3    answer = -1
4
5    while left <= right:
6        mid = (left + right) // 2
7        if nums[mid] < target:
8            left = mid + 1
9        elif nums[mid] > target:
10            right = mid - 1
11        else:
12            answer = mid
13            right = mid - 1
14
15    return answer

If the value does not exist, the function returns -1.

Find the Rightmost Match

The right boundary is symmetric. When you find the target, keep searching right.

python

1def find_last(nums, target):
2    left, right = 0, len(nums) - 1
3    answer = -1
4
5    while left <= right:
6        mid = (left + right) // 2
7        if nums[mid] < target:
8            left = mid + 1
9        elif nums[mid] > target:
10            right = mid - 1
11        else:
12            answer = mid
13            left = mid + 1
14
15    return answer

Now you have the full range of duplicates.

Put the Two Searches Together

python

1def find_all_range(nums, target):
2    first = find_first(nums, target)
3    if first == -1:
4        return (-1, -1, 0)
5
6    last = find_last(nums, target)
7    count = last - first + 1
8    return (first, last, count)
9
10
11nums = [1, 2, 4, 4, 4, 7, 9]
12print(find_all_range(nums, 4))

This returns the first index, the last index, and the number of matching entries.

If you actually need the matching values or indexes, you can derive them from that range.

Complexity and Tradeoffs

Each boundary search is O(log n), so the search work stays O(log n) overall. That is much better than scanning the entire array when the array is large.

If you need to physically return every matching index, the output step costs O(k), where k is the number of matches. That is unavoidable because writing k results takes time proportional to k.

When This Pattern Is Useful

This approach is useful whenever duplicates are expected in sorted data, for example:

database-style sorted IDs
timestamp buckets
lookup tables with repeated values
frequency counting in sorted arrays

The key requirement is that the input must already be sorted.

Common Pitfalls

Stopping at the first match and assuming it represents the whole duplicate range.
Forgetting that the array must be sorted before binary search is valid.
Returning one boundary correctly but using an off-by-one formula for the count.
Falling back to a full scan when a boundary search would stay logarithmic.

Summary

A normal binary search only guarantees one matching position.
To find all duplicates efficiently, search for the first and last occurrence separately.
Two boundary searches still run in O(log n) time.
Returning every matching index adds output cost proportional to the number of matches.
This technique only works correctly on sorted input.