Finding multiple entries with binary search
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Standard binary search tells you whether a value exists in a sorted array and often returns one matching index. If duplicates are allowed, that is not enough. To find all matching entries efficiently, the usual approach is to run binary search twice: once for the left boundary and once for the right boundary.
Why One Binary Search Is Not Enough
Suppose the sorted array is:
A normal binary search for 4 may return index 3, but that does not tell you whether matching values also appear at indices 2 and 4.
So the real problem becomes:
- find the first occurrence
- find the last occurrence
- optionally compute the count or return the slice of matching indexes
That keeps the time complexity logarithmic for the search phase instead of scanning the whole array linearly.
Find the Leftmost Match
The left boundary search is a slightly modified binary search. When you find the target, you keep searching left instead of stopping immediately.
If the value does not exist, the function returns -1.
Find the Rightmost Match
The right boundary is symmetric. When you find the target, keep searching right.
Now you have the full range of duplicates.
Put the Two Searches Together
This returns the first index, the last index, and the number of matching entries.
If you actually need the matching values or indexes, you can derive them from that range.
Complexity and Tradeoffs
Each boundary search is O(log n), so the search work stays O(log n) overall. That is much better than scanning the entire array when the array is large.
If you need to physically return every matching index, the output step costs O(k), where k is the number of matches. That is unavoidable because writing k results takes time proportional to k.
When This Pattern Is Useful
This approach is useful whenever duplicates are expected in sorted data, for example:
- database-style sorted IDs
- timestamp buckets
- lookup tables with repeated values
- frequency counting in sorted arrays
The key requirement is that the input must already be sorted.
Common Pitfalls
- Stopping at the first match and assuming it represents the whole duplicate range.
- Forgetting that the array must be sorted before binary search is valid.
- Returning one boundary correctly but using an off-by-one formula for the count.
- Falling back to a full scan when a boundary search would stay logarithmic.
Summary
- A normal binary search only guarantees one matching position.
- To find all duplicates efficiently, search for the first and last occurrence separately.
- Two boundary searches still run in
O(log n)time. - Returning every matching index adds output cost proportional to the number of matches.
- This technique only works correctly on sorted input.

