linked list
algorithm
random selection
single pass
data structure

Get a random element in single direction linked list by one time traverse

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

In the realm of computer science, particularly data structures, a linked list offers a simplistic and dynamic alternative to arrays. In a singly-linked list, each node points to the next node and the list terminates at a node pointing to `null`. An intriguing problem involving singly-linked lists is selecting a random element while traversing the list only once. This article explores a method called Reservoir Sampling to achieve this.

Reservoir Sampling

Reservoir Sampling is an algorithm primarily used when dealing with a large or unknown data stream size. This method is particularly beneficial for selecting a random element from a singly-linked list without prior knowledge of its length, making it perfect for single-pass operations.

Algorithm Explanation

  1. Initialization: Begin by initializing a counter `n` to 1 and choosing the first node as your initial random element (let’s call this node `chosen`).
  2. Traversal: Iterate over the rest of the list. For each node at index `i` (where `i` starts from 2):
    • Generate a random integer `r` such that `1 <= r <= i`.
    • If `r` equals `i`, update `chosen` to the current node. Essentially, this step allows each node to have a fair probability of being selected as the random node.
  3. Termination: By the time the traversal has reached the end of the list, the node in `chosen` will be the desired random node. The probability that any node gets selected is `1/n`, where `n` is the length of the linked list.

Detailed Example

Consider a singly-linked list containing the nodes [A, B, C, D]. Implementing the Reservoir Sampling algorithm looks like this:

  • Step 1: Start with `chosen = A` and `n = 1`.
  • Step 2: Move to node B, generate `r` between `[1, 2]`. If `r` = 2, update `chosen = B`.
  • Step 3: Move to node C, generate `r` between `[1, 3]`. If `r` = 3, update `chosen = C`.
  • Step 4: Move to node D, generate `r` between `[1, 4]`. If `r` = 4, update `chosen = D`.

At the end of the traversal, `chosen` is a randomly selected node from the list.

Mathematical Justification

To understand why this method works, let's delve a bit into its probability model:

  • For node `k` to be selected as the final element, it must replace the current element at round `k` and never be replaced afterwards.
  • When `k` is encountered, it gets selected with probability `1/k`.
  • As the list proceeds, it should not be replaced by each subsequent node `i` from `k+1` to `n`, giving it a cumulative probability: P(k is chosen)=1kkk+1k+1k+2n1nP(k \text{ is chosen}) = \frac{1}{k} \cdot \frac{k}{k+1} \cdot \frac{k+1}{k+2} \cdots \frac{n-1}{n} Which simplifies to: P(k is chosen)=1nP(k \text{ is chosen}) = \frac{1}{n}

This demonstrates that each node has an equal probability of being chosen, thereby justifying the algorithm's correctness.

Complexity Analysis

  • Time Complexity: O(n), where `n` is the number of nodes in the list. This is due to the single traversal required over the linked list.
  • Space Complexity: O(1), given that constant space is used, disregarding the input size.

Use Case Scenarios

  • Stream Processing: Ideal for processing elements in streaming data as it requires minimal memory.
  • Large Dataset Sampling: Efficient for data sampling from large datasets where the size is unknown or it is impractical to load all elements into memory.
  • Distributed Systems: Useful in distributed computing environments where elements are distributed across multiple nodes.

Summary

The following table encapsulates the core aspects of using Reservoir Sampling for picking a random element from a singly-linked list:

AspectDescription
PurposeSelect a random element with a single traversal.
Algorithm UsedReservoir Sampling
Space ComplexityO(1)
Time ComplexityO(n)
Ideal ScenariosStream processing, large datasets, distributed systems
ProbabilityEach node has a 1/n probability of being chosen, where n is the list length.

Reservoir Sampling provides an elegant and sophisticated solution to the problem of random selection from a singly linked list, ensuring efficiency both in time and computational resources. It has distinct advantages in scenarios involving large or streaming datasets.


Course illustration
Course illustration

All Rights Reserved.