Data structure to find integers within a query range efficiently
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Range queries such as finding values where lo <= x <= hi appear in analytics, search, and indexing systems. The best data structure depends on query type, update frequency, and domain size. Picking by workload profile usually beats picking the most complex algorithm available.
Define Operations Before Choosing a Structure
Start by clarifying required operations:
- Count values in range only, or return actual values.
- Static dataset, append-only, or frequent insert and delete.
- Duplicate values allowed or unique-only.
- Integer domain bounded or very large and sparse.
These requirements determine whether a sorted array, balanced ordered container, Fenwick tree, or segment tree is a better fit.
Sorted Array Plus Binary Search
For mostly static data with many queries, a sorted list with binary search is often the fastest and simplest choice.
Query complexity is good, but inserts in the middle are costly.
Ordered Dynamic Container for Frequent Updates
When insert and delete operations are frequent, use an ordered structure with efficient boundary search.
This keeps updates and queries balanced with clean API support.
Fenwick Tree for Bounded Integer Count Queries
If values map to a known bounded domain and you mainly need counts, Fenwick trees are highly efficient.
If domain values are sparse and large, coordinate compression is usually required.
Segment Tree and Related Structures
Segment trees are useful when you need more than counts, such as range minimum, maximum, or sum with updates. They are powerful but more complex than needed for simple membership queries.
Use them when richer range statistics justify implementation and maintenance cost.
Interval trees solve overlap between intervals, not integer membership queries over point values. Keep these problem types separate.
API Design Affects Performance
Structure choice alone is not enough. API shape can dominate cost.
Prefer separate methods such as:
count_in_range(lo, hi)exists_in_range(lo, hi)values_in_range(lo, hi, limit)
Returning full value lists when callers only need counts wastes memory and latency budget.
Benchmark With Real Data Patterns
Theoretical complexity helps, but real data distribution matters.
Benchmark using realistic scenarios:
- Hot ranges hit frequently.
- Duplicate-heavy values.
- Mixed query and update ratios.
- Large result ranges versus small result ranges.
Measure both latency and memory behavior, not only algorithmic notation.
Practical Selection Guide
A practical rule:
- Static query-heavy dataset uses sorted list plus binary search.
- Frequent updates with ordered range lookup uses ordered dynamic container.
- Bounded-domain count focus uses Fenwick or segment structure.
Choosing this way avoids unnecessary complexity while meeting performance goals.
Common Pitfalls
- Using complex trees for static data where sorted arrays are enough.
- Ignoring duplicates and producing wrong counts.
- Returning full result lists when count-only is needed.
- Skipping coordinate compression for sparse large domains in Fenwick-based designs.
- Selecting structures from theory alone without workload benchmarks.
Summary
- Range-query structure choice should follow operation and workload requirements.
- Sorted arrays with binary search are excellent for static query-heavy datasets.
- Ordered dynamic containers fit frequent insert and delete workloads.
- Fenwick and segment structures are strong for bounded-domain range statistics.
- Separate count and retrieval APIs to keep performance predictable at scale.

