N-way merge
algorithm design
data structures
computer science
sorting algorithms

Algorithm for N-way merge

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

N-way merge combines multiple already-sorted sequences into one sorted output stream. It is a core building block in external sorting, log aggregation, streaming joins, and merge phases of many database engines. The most practical implementation uses a min-heap to always select the next smallest candidate efficiently.

Problem Definition

Given k sorted lists containing total n elements, produce one sorted sequence containing all elements.

A naive approach compares heads of all lists every step, which costs O of k per output element. For large k, that is inefficient.

Heap-Based N-way Merge

Use a min-heap storing one current element per input list.

Algorithm:

  1. push first element from each non-empty list into heap
  2. pop smallest element and append to output
  3. advance pointer in list from which element was popped
  4. push next element from that list
  5. repeat until heap is empty

This reduces per-element selection to logarithmic heap operations.

Python Implementation

python
1import heapq
2from typing import List
3
4
5def n_way_merge(sorted_lists: List[List[int]]) -> List[int]:
6    heap = []
7    result = []
8
9    # Heap items: (value, list_index, element_index)
10    for list_idx, arr in enumerate(sorted_lists):
11        if arr:
12            heapq.heappush(heap, (arr[0], list_idx, 0))
13
14    while heap:
15        value, list_idx, elem_idx = heapq.heappop(heap)
16        result.append(value)
17
18        next_idx = elem_idx + 1
19        if next_idx < len(sorted_lists[list_idx]):
20            next_value = sorted_lists[list_idx][next_idx]
21            heapq.heappush(heap, (next_value, list_idx, next_idx))
22
23    return result
24
25
26data = [
27    [1, 5, 9],
28    [2, 4, 8],
29    [3, 6, 7],
30]
31
32print(n_way_merge(data))

This is the standard and robust approach for in-memory inputs.

Complexity Analysis

Let n be total number of elements and k number of lists.

  • each element is pushed and popped once
  • each heap operation costs O of log k
  • total time is O of n log k
  • extra memory is O of k for heap plus output container

This scales much better than repeated full-head scans when k grows.

Streaming Merge Variant

If outputs should be consumed incrementally, implement merge as generator.

python
1import heapq
2from typing import Iterable, Iterator, List
3
4
5def merge_iterables(sorted_iters: List[Iterable[int]]) -> Iterator[int]:
6    heap = []
7    iterators = [iter(it) for it in sorted_iters]
8
9    for i, it in enumerate(iterators):
10        try:
11            first = next(it)
12            heapq.heappush(heap, (first, i))
13        except StopIteration:
14            pass
15
16    while heap:
17        value, i = heapq.heappop(heap)
18        yield value
19
20        try:
21            nxt = next(iterators[i])
22            heapq.heappush(heap, (nxt, i))
23        except StopIteration:
24            pass

Generator form is useful for large inputs or pipelines where writing full output list is unnecessary.

Handling Duplicate Values and Stable Ordering

Heap tuples include list index and position, which naturally provide deterministic tie-breaking when values are equal. This gives stable behavior for equal keys across runs.

If you need strict global stability by original source order, include additional sequence counters in heap entries.

Practical Use Cases

N-way merge appears in:

  • merge step after sorting chunks on disk
  • merging timestamped logs from services
  • combining sorted result shards in distributed systems
  • k-way merge in map-reduce style pipelines

In these domains, streaming output and memory efficiency often matter more than raw CPU speed alone.

External Sorting Context

When data exceeds memory:

  1. split file into chunks
  2. sort each chunk and write temporary sorted files
  3. run N-way merge over temporary files
  4. write final sorted output

N-way merge is the critical final phase that keeps disk reads sequential and predictable.

Common Pitfalls

A common pitfall is using pairwise merging repeatedly, which can increase total work compared with a heap-based single N-way pass. Another is loading all data into memory before merge, defeating streaming advantages. Teams also sometimes forget to handle empty lists and hit index errors during heap initialization. Finally, unstable tie handling can produce nondeterministic output order for equal values if heap keys are underspecified.

Summary

  • N-way merge combines multiple sorted inputs into one sorted output.
  • Min-heap implementation gives O of n log k time complexity.
  • Streaming generator variants are ideal for large or continuous inputs.
  • Tie-breaking should be explicit for deterministic outputs.
  • This algorithm is central to external sorting and distributed data pipelines.

Course illustration
Course illustration

All Rights Reserved.