probability
logarithms
mathematical techniques
probability calculation
mathematical methods

Working with small probabilities, via logs

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When dealing with small probabilities, one of the major challenges is numerical underflow and loss of precision, especially when many tiny values are multiplied together. A standard fix is to work with logarithms of probabilities instead. That turns fragile multiplication into stable addition and is one of the core numerical tricks in machine learning and probabilistic modeling.

Understanding Logarithmic Transformation

Let's explore why logs are so useful. Suppose we are working with probabilities that are extremely small, say p = 10^{-10}. Given two probabilities, p_1 and p_2, the product p_1 * p_2 can be very close to the machine's precision limits, posing the risk of underflow. By using logarithms, specifically natural logarithms denoted as ln, we transform these probabilities as follows:

\log(p_1 \times p_2) = \log(p_1) + \log(p_2)$$ This transformation changes multiplications into additions. As computers are inherently more efficient and accurate with addition than with multiplication of small decimals, this is a significant advantage. ### Practical Example Consider a scenario where we need to find the probability of an independent event sequence A and B, with probabilities p(A) = 0.00001 and p(B) = 0.00002, respectively. Direct multiplication: $$ p(A \cap B) = p(A) \cdot p(B) = 0.00001 \cdot 0.00002 = 2 \times 10^{-10} Using logarithmic transformation:

  • log(p(A))=log(0.00001)=11.5129\log(p(A)) = \log(0.00001) = -11.5129
  • log(p(B))=log(0.00002)=10.8198\log(p(B)) = \log(0.00002) = -10.8198

Sum of logs: log(p(AB))=log(p(A))+log(p(B))=11.5129+(10.8198)=22.3327\log(p(A \cap B)) = \log(p(A)) + \log(p(B)) = -11.5129 + (-10.8198) = -22.3327 Converting back to probability: p(AB)=e22.33272×1010p(A \cap B) = e^{-22.3327} \approx 2 \times 10^{-10} As shown, using logs provides the same result without risking precision loss.

Python Example

python
1import math
2
3p_a = 1e-5
4p_b = 2e-5
5
6direct = p_a * p_b
7log_total = math.log(p_a) + math.log(p_b)
8recovered = math.exp(log_total)
9
10print(direct)
11print(log_total)
12print(recovered)

This pattern scales much better when the number of multiplied probabilities grows.

Benefits of Using Logarithms

  1. Numerical Stability: Avoids underflow/overflow issues by operating within ranges where the floating-point arithmetic retains precision.
  2. Performance Efficiency: Transforming multiplications into additions reduces computational complexity and improves speed.
  3. Simplified Derivatives: In mathematical optimization, working with logarithms can simplify the computation of gradients.

Log-Sum-Exp Trick

A crucial subtopic, particularly in log-scale computations, is the log-sum-exp trick. This is used to calculate the log of a sum of exponentials, ensuring numerical stability by avoiding overflow.

For a set of numbers [x_1, x_2, ..., x_n], the sum of exponentials is:

log(i=1nexi)\log(\sum_{i=1}^{n} e^{x_i}) This can cause overflow if x_i are large. Instead, compute:

M = \max(x_1, x_2, ..., x_n)$$Then, $$ \log(\sum_{i=1}^{n} e^{x_i}) = M + \log(\sum_{i=1}^{n} e^{x_i - M}) This adjustment uses MM to scale down the exponentials, effectively stabilizing the computation.

Applications in Machine Learning

  • Softmax Functions: In classification algorithms, the softmax function often benefits from log-domain calculations through log-sum-exp tricks.
  • Loss Functions: Cross-entropy loss in neural networks is frequently computed using log probabilities.
  • Probabilistic Graphical Models: Algorithms like Expectation-Maximization utilize log probabilities due to their simplification properties.

Common Pitfalls

  • Multiplying many tiny probabilities directly until the result underflows toward zero.
  • Forgetting that addition in log space requires log-sum-exp rather than ordinary +.
  • Taking the logarithm of zero without deciding how zero-probability events should be represented.
  • Exponentiating back into probability space too early and reintroducing numerical problems.
  • Comparing raw probabilities when comparing log probabilities would give the same ordering more safely.

Summary

  • Log probabilities turn unstable products into stable sums.
  • Working in log space is a standard way to avoid underflow with tiny probabilities.
  • The log-sum-exp trick is necessary when adding probabilities in log form.
  • Code often stays in log space until the final presentation step.
  • This technique is widely used in machine learning, Bayesian inference, and sequence models.

Course illustration
Course illustration

All Rights Reserved.