What's a good, generic algorithm for collapsing a set of potentially-overlapping ranges?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Collapsing a set of potentially overlapping ranges is a common problem in various fields such as computational biology, computer graphics, and database management. The objective is to merge overlapping or contiguous ranges into non-overlapping ranges, which often results in simplification and enhances efficiency during processing. This article explores a generic algorithm to handle this task, with technical details, examples, and a summary for easy understanding.
Problem Definition
Given a list of ranges where each range is defined by a start and end point, the goal is to merge overlapping or contiguous ranges into a single continuous range. For example, given the ranges:
- [1, 5]
- [3, 7]
- [8, 10]
- [9, 11]
The merged output should be:
- [1, 7]
- [8, 11]
Algorithm Explanation
The proposed algorithm is designed to efficiently collapse ranges by utilizing sorting and a single pass through the data. Here's a step-by-step breakdown:
- Sorting the Ranges: Sort the ranges by their start value. If two ranges have the same start value, sort by their end value. This ensures that overlapping or contiguous ranges are adjacent to each other.
- Initializing the Collapsed List: Start with an empty list, `collapsed_ranges`, to hold the merged output.
- Iterating Through the Ranges:
- For each range in the sorted list, check if there is an overlap or if the ranges are contiguous with the last collapsed range.
- Two ranges [a, b] and [c, d] are overlapping or contiguous if `c <= b + 1`.
- If overlapping or contiguous, merge the ranges by updating the end of the last range in `collapsed_ranges` to `max(b, d)`.
- Otherwise, add the current range as a new entry in `collapsed_ranges`.
- Output the Collapsed Ranges: After processing all the ranges, the `collapsed_ranges` list contains the merged, non-overlapping ranges.
Pseudocode
- Sorted: [1, 5], [3, 7], [8, 10], [9, 11]
- Start with [1, 5] in `collapsed_ranges`.
- Check [3, 7]: overlaps with [1, 5], merge to [1, 7].
- Add [8, 10] to `collapsed_ranges`.
- Check [9, 11]: overlaps with [8, 10], merge to [8, 11].
- Time Complexity: The algorithm's complexity is dominated by the sorting step, which is for `n` ranges. The subsequent merging is a linear operation, .
- Space Complexity: The algorithm uses additional space for storing the sorted list and `collapsed_ranges`, both of which require space.

