downsampling
algorithm
array intervals
data reduction
computational methods

Algorithm for downsampling array of intervals

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

There is no single correct way to downsample an array of intervals because the right algorithm depends on what you want to preserve. If the goal is to keep the overall coverage shape while reducing the number of intervals, a practical approach is to sort and merge first, then summarize the intervals into a fixed number of buckets.

Decide What "Downsampling" Means

For interval data, different goals lead to different algorithms:

  • preserve total covered length
  • preserve the rough distribution across the axis
  • keep the longest or most important intervals
  • reduce to at most k representative intervals

A common visualization-oriented goal is: "reduce this interval list to at most k intervals while preserving where coverage exists." That is the assumption used below.

Step 1: Merge Overlapping Intervals

Before downsampling, merge overlaps so the algorithm does not waste budget on redundant segments.

python
1def merge_intervals(intervals):
2    intervals = sorted(intervals)
3    merged = []
4
5    for start, end in intervals:
6        if not merged or start > merged[-1][1]:
7            merged.append([start, end])
8        else:
9            merged[-1][1] = max(merged[-1][1], end)
10
11    return [tuple(x) for x in merged]

This step is often more important than the downsampling itself because overlapping raw intervals can exaggerate density in ways that are not actually meaningful.

Step 2: Bucket the Axis and Summarize

After merging, divide the global range into k buckets and summarize intervals that land in each bucket. A simple summary is to keep the minimum start and maximum end seen in that bucket.

python
1import math
2
3
4def downsample_intervals(intervals, k):
5    merged = merge_intervals(intervals)
6    if len(merged) <= k:
7        return merged
8
9    global_start = min(start for start, _ in merged)
10    global_end = max(end for _, end in merged)
11    bucket_size = (global_end - global_start) / k
12
13    buckets = [[] for _ in range(k)]
14    for start, end in merged:
15        midpoint = (start + end) / 2
16        index = min(k - 1, math.floor((midpoint - global_start) / bucket_size))
17        buckets[index].append((start, end))
18
19    result = []
20    for bucket in buckets:
21        if bucket:
22            bucket_start = min(start for start, _ in bucket)
23            bucket_end = max(end for _, end in bucket)
24            result.append((bucket_start, bucket_end))
25
26    return result

This does not preserve every original boundary, but it keeps a representative sense of where intervals live along the axis.

Tradeoffs of This Strategy

The bucket approach is useful for dashboards, previews, and coarse summaries. It is less appropriate if exact boundaries matter or if a small short interval must never be swallowed by a longer neighbor in the same bucket.

If exactness matters more than visual shape, a different strategy may be better, such as:

  • keeping the longest k intervals
  • sampling by importance weight
  • choosing intervals that maximize covered length under a budget

In other words, the algorithm should match the meaning of the data, not just the fact that the data happens to be stored as intervals.

Common Pitfalls

  • Downsampling before merging overlaps and overcounting redundant structure.
  • Using midpoint buckets when exact boundaries are the real requirement.
  • Assuming one generic downsampling algorithm fits every interval problem.
  • Ignoring whether the output is for visualization, storage, or exact computation.
  • Forgetting that bucket summaries can merge distinct local gaps.

Summary

  • Downsampling intervals only makes sense once the preservation goal is clear.
  • Merging overlaps first is usually the right first step.
  • Bucket-based summarization is a practical way to reduce to about k intervals.
  • The result is approximate and best suited for coarse summaries or visualization.
  • Choose a different algorithm if exact interval boundaries must be preserved.

Course illustration
Course illustration

All Rights Reserved.