data structures
algorithms
query optimization
integer search
range queries

Data structure to find integers within a query range efficiently

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Range queries such as finding values where lo <= x <= hi appear in analytics, search, and indexing systems. The best data structure depends on query type, update frequency, and domain size. Picking by workload profile usually beats picking the most complex algorithm available.

Define Operations Before Choosing a Structure

Start by clarifying required operations:

  • Count values in range only, or return actual values.
  • Static dataset, append-only, or frequent insert and delete.
  • Duplicate values allowed or unique-only.
  • Integer domain bounded or very large and sparse.

These requirements determine whether a sorted array, balanced ordered container, Fenwick tree, or segment tree is a better fit.

For mostly static data with many queries, a sorted list with binary search is often the fastest and simplest choice.

python
1from bisect import bisect_left, bisect_right
2
3
4def range_values(sorted_nums, lo, hi):
5    left = bisect_left(sorted_nums, lo)
6    right = bisect_right(sorted_nums, hi)
7    return sorted_nums[left:right]
8
9
10def range_count(sorted_nums, lo, hi):
11    return bisect_right(sorted_nums, hi) - bisect_left(sorted_nums, lo)
12
13
14nums = sorted([8, 3, 5, 10, 7, 6, 6, 6])
15print(range_values(nums, 5, 8))
16print(range_count(nums, 5, 8))

Query complexity is good, but inserts in the middle are costly.

Ordered Dynamic Container for Frequent Updates

When insert and delete operations are frequent, use an ordered structure with efficient boundary search.

python
1from sortedcontainers import SortedList
2
3sl = SortedList([8, 3, 5, 10, 7, 6])
4sl.add(9)
5sl.discard(3)
6
7left = sl.bisect_left(5)
8right = sl.bisect_right(9)
9
10print(list(sl.islice(left, right)))
11print(right - left)

This keeps updates and queries balanced with clean API support.

Fenwick Tree for Bounded Integer Count Queries

If values map to a known bounded domain and you mainly need counts, Fenwick trees are highly efficient.

python
1class Fenwick:
2    def __init__(self, size):
3        self.bit = [0] * (size + 1)
4
5    def add(self, index, delta):
6        while index < len(self.bit):
7            self.bit[index] += delta
8            index += index & -index
9
10    def prefix_sum(self, index):
11        result = 0
12        while index > 0:
13            result += self.bit[index]
14            index -= index & -index
15        return result
16
17    def range_count(self, left, right):
18        return self.prefix_sum(right) - self.prefix_sum(left - 1)
19
20
21fw = Fenwick(10)
22for value in [2, 3, 3, 7, 9]:
23    fw.add(value, 1)
24
25print(fw.range_count(3, 7))

If domain values are sparse and large, coordinate compression is usually required.

Segment trees are useful when you need more than counts, such as range minimum, maximum, or sum with updates. They are powerful but more complex than needed for simple membership queries.

Use them when richer range statistics justify implementation and maintenance cost.

Interval trees solve overlap between intervals, not integer membership queries over point values. Keep these problem types separate.

API Design Affects Performance

Structure choice alone is not enough. API shape can dominate cost.

Prefer separate methods such as:

  • count_in_range(lo, hi)
  • exists_in_range(lo, hi)
  • values_in_range(lo, hi, limit)

Returning full value lists when callers only need counts wastes memory and latency budget.

Benchmark With Real Data Patterns

Theoretical complexity helps, but real data distribution matters.

Benchmark using realistic scenarios:

  • Hot ranges hit frequently.
  • Duplicate-heavy values.
  • Mixed query and update ratios.
  • Large result ranges versus small result ranges.

Measure both latency and memory behavior, not only algorithmic notation.

Practical Selection Guide

A practical rule:

  • Static query-heavy dataset uses sorted list plus binary search.
  • Frequent updates with ordered range lookup uses ordered dynamic container.
  • Bounded-domain count focus uses Fenwick or segment structure.

Choosing this way avoids unnecessary complexity while meeting performance goals.

Common Pitfalls

  • Using complex trees for static data where sorted arrays are enough.
  • Ignoring duplicates and producing wrong counts.
  • Returning full result lists when count-only is needed.
  • Skipping coordinate compression for sparse large domains in Fenwick-based designs.
  • Selecting structures from theory alone without workload benchmarks.

Summary

  • Range-query structure choice should follow operation and workload requirements.
  • Sorted arrays with binary search are excellent for static query-heavy datasets.
  • Ordered dynamic containers fit frequent insert and delete workloads.
  • Fenwick and segment structures are strong for bounded-domain range statistics.
  • Separate count and retrieval APIs to keep performance predictable at scale.

Course illustration
Course illustration

All Rights Reserved.