arrays
block recognition
data analysis
algorithm design
programming techniques

Finding blocks in arrays

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finding blocks in arrays means detecting contiguous segments that share a property. The property can be exact value equality, threshold membership, or a custom predicate. Good implementations are linear, explicit about boundaries, and easy to adapt to different block definitions.

Define Block Semantics Before Coding

Most bugs in block detection come from unclear requirements, not from syntax. Decide these rules up front:

  • Whether boundaries are inclusive or exclusive.
  • Whether single element runs count as blocks.
  • How to handle missing values.
  • Whether output should include block values, indices, or both.

For analysis tasks, returning start index and end index is usually the most useful shape because downstream code can slice or aggregate efficiently.

Detect Equal Value Runs in One Pass

The classic case is grouping consecutive equal values. A single pass with a run start pointer gives linear time and constant extra memory.

python
1from typing import List, Tuple, Any
2
3
4def find_equal_blocks(arr: List[Any]) -> List[Tuple[int, int, Any]]:
5    if not arr:
6        return []
7
8    out = []
9    start = 0
10
11    for i in range(1, len(arr) + 1):
12        if i == len(arr) or arr[i] != arr[start]:
13            out.append((start, i - 1, arr[start]))
14            start = i
15
16    return out
17
18
19print(find_equal_blocks([2, 2, 2, 5, 5, 1, 1, 1, 1]))

This approach avoids nested loops and scales well for large arrays.

Detect Predicate Blocks

Often you need blocks that satisfy a condition instead of exact equality. For example, all contiguous values above a threshold.

python
1from typing import Callable
2
3
4def find_predicate_blocks(arr: List[float], pred: Callable[[float], bool]):
5    blocks = []
6    start = None
7
8    for i, value in enumerate(arr):
9        if pred(value):
10            if start is None:
11                start = i
12        else:
13            if start is not None:
14                blocks.append((start, i - 1))
15                start = None
16
17    if start is not None:
18        blocks.append((start, len(arr) - 1))
19
20    return blocks
21
22
23series = [0.1, 0.3, 0.8, 0.9, 0.2, 0.95, 0.97, 0.1]
24print(find_predicate_blocks(series, lambda x: x >= 0.8))

This pattern is useful in signal processing, monitoring windows, and anomaly segmentation.

Return Rich Block Metadata

If downstream steps need summaries, compute metadata while closing each block. That prevents extra passes.

python
1def summarize_blocks(arr):
2    runs = find_equal_blocks(arr)
3    return [
4        {
5            "start": s,
6            "end": e,
7            "value": v,
8            "length": e - s + 1,
9        }
10        for s, e, v in runs
11    ]
12
13
14print(summarize_blocks([1, 1, 3, 3, 3, 2]))

For heavy workloads, this can be extended with rolling statistics, but keep the base model simple first.

Handle Large Data Streams

When arrays are too large to fit comfortably in memory, process chunks and carry the final open block between chunks. The key detail is that a block can start in one chunk and end in the next.

A practical stream strategy:

  1. Process each chunk with block detection.
  2. Merge the first block of the new chunk with the previous carry block if values match.
  3. Emit completed blocks progressively.
  4. Keep only one carry block in memory.

This design supports log processing and telemetry pipelines without sacrificing correctness at chunk boundaries.

Test Boundary Cases Explicitly

At minimum, test these cases:

  • Empty array.
  • One element array.
  • All elements equal.
  • Alternating values.
  • Block ending at final index.

Example quick checks:

python
assert find_equal_blocks([]) == []
assert find_equal_blocks([7]) == [(0, 0, 7)]
assert find_equal_blocks([1, 2, 1, 2]) == [(0, 0, 1), (1, 1, 2), (2, 2, 1), (3, 3, 2)]

These assertions catch most off by one errors immediately.

Common Pitfalls

  • Forgetting to close the active block after the loop ends.
  • Mixing inclusive and exclusive index conventions across functions.
  • Treating missing values inconsistently when forming blocks.
  • Using nested scans and creating unnecessary quadratic behavior.
  • Returning output that is hard for downstream code to consume.

Summary

  • Define block rules before implementation, especially boundary semantics.
  • Use a one pass pointer model for linear time block detection.
  • Generalize with predicates for threshold and condition based blocks.
  • Add metadata at block close time to avoid extra traversals.
  • Test edge cases explicitly to prevent boundary bugs.

Course illustration
Course illustration

All Rights Reserved.