Finding blocks in arrays
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finding blocks in arrays means detecting contiguous segments that share a property. The property can be exact value equality, threshold membership, or a custom predicate. Good implementations are linear, explicit about boundaries, and easy to adapt to different block definitions.
Define Block Semantics Before Coding
Most bugs in block detection come from unclear requirements, not from syntax. Decide these rules up front:
- Whether boundaries are inclusive or exclusive.
- Whether single element runs count as blocks.
- How to handle missing values.
- Whether output should include block values, indices, or both.
For analysis tasks, returning start index and end index is usually the most useful shape because downstream code can slice or aggregate efficiently.
Detect Equal Value Runs in One Pass
The classic case is grouping consecutive equal values. A single pass with a run start pointer gives linear time and constant extra memory.
This approach avoids nested loops and scales well for large arrays.
Detect Predicate Blocks
Often you need blocks that satisfy a condition instead of exact equality. For example, all contiguous values above a threshold.
This pattern is useful in signal processing, monitoring windows, and anomaly segmentation.
Return Rich Block Metadata
If downstream steps need summaries, compute metadata while closing each block. That prevents extra passes.
For heavy workloads, this can be extended with rolling statistics, but keep the base model simple first.
Handle Large Data Streams
When arrays are too large to fit comfortably in memory, process chunks and carry the final open block between chunks. The key detail is that a block can start in one chunk and end in the next.
A practical stream strategy:
- Process each chunk with block detection.
- Merge the first block of the new chunk with the previous carry block if values match.
- Emit completed blocks progressively.
- Keep only one carry block in memory.
This design supports log processing and telemetry pipelines without sacrificing correctness at chunk boundaries.
Test Boundary Cases Explicitly
At minimum, test these cases:
- Empty array.
- One element array.
- All elements equal.
- Alternating values.
- Block ending at final index.
Example quick checks:
These assertions catch most off by one errors immediately.
Common Pitfalls
- Forgetting to close the active block after the loop ends.
- Mixing inclusive and exclusive index conventions across functions.
- Treating missing values inconsistently when forming blocks.
- Using nested scans and creating unnecessary quadratic behavior.
- Returning output that is hard for downstream code to consume.
Summary
- Define block rules before implementation, especially boundary semantics.
- Use a one pass pointer model for linear time block detection.
- Generalize with predicates for threshold and condition based blocks.
- Add metadata at block close time to avoid extra traversals.
- Test edge cases explicitly to prevent boundary bugs.

