Calculate the Cumulative Distribution Function CDF in Python

CDF

Python Programming

Data Analysis

Probability

Statistics

Calculate the Cumulative Distribution Function CDF in Python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

In Python, calculating a CDF depends on what kind of distribution you have. If you know the theoretical distribution, use a statistics library such as SciPy to evaluate the distribution's cdf function directly. If you only have observed data, build an empirical CDF from the sorted sample.

Parametric CDF with SciPy

For a known distribution such as the normal distribution, SciPy gives you the CDF directly.

python

1from scipy.stats import norm
2
3mu = 0.0
4sigma = 1.0
5x = 1.5
6
7value = norm.cdf(x, loc=mu, scale=sigma)
8print(value)

This returns the probability that a normal random variable with mean 0 and standard deviation 1 is less than or equal to 1.5.

The same pattern works for many other distributions in scipy.stats, such as expon, binom, or poisson. The main idea is always the same: choose the distribution object and call its cdf method with the relevant parameters.

Empirical CDF from Sample Data

If you do not want to assume a theoretical distribution, compute an empirical CDF from observed values.

python

1import numpy as np
2
3samples = np.array([2.0, 1.0, 4.0, 3.0, 3.0])
4sorted_samples = np.sort(samples)
5cdf_values = np.arange(1, len(sorted_samples) + 1) / len(sorted_samples)
6
7print(sorted_samples)
8print(cdf_values)

Each point in cdf_values is the fraction of observations less than or equal to the corresponding sorted sample value. This is a direct, non-parametric description of the data.

If you want to evaluate the empirical CDF at a specific query point, count how many samples are less than or equal to that point.

python

1import numpy as np
2
3samples = np.array([2.0, 1.0, 4.0, 3.0, 3.0])
4x = 3.0
5empirical_cdf = np.mean(samples <= x)
6print(empirical_cdf)

That is the simplest possible empirical CDF calculation.

Plot the CDF

A CDF is often easier to understand visually than numerically. For an empirical CDF, a step plot is usually the clearest representation.

python

1import matplotlib.pyplot as plt
2import numpy as np
3
4samples = np.random.normal(loc=0, scale=1, size=1000)
5sorted_samples = np.sort(samples)
6cdf_values = np.arange(1, len(sorted_samples) + 1) / len(sorted_samples)
7
8plt.step(sorted_samples, cdf_values, where="post")
9plt.xlabel("x")
10plt.ylabel("F(x)")
11plt.title("Empirical CDF")
12plt.show()

For a theoretical distribution, you can generate a grid of x values and call the distribution's cdf method across the grid.

This distinction between theoretical and empirical CDFs matters a lot in analysis. A theoretical CDF assumes a model, such as normal or exponential, while an empirical CDF is just a summary of what the sample actually contains. One is model-based, the other is data-based.

Choosing the wrong one can turn a simple probability question into a modeling mistake.

That is why good statistical code starts by deciding whether the problem is about a known distribution or about observed data only.

The CDF formula may be simple, but the modeling choice behind it is not.

Common Pitfalls

Mixing up the CDF with the PDF or PMF.
Using a theoretical distribution when the data should be handled empirically.
Forgetting that the CDF is the probability of being less than or equal to a value.
Comparing empirical and theoretical CDFs without checking parameter assumptions.
Plotting noisy data without sorting it first for the empirical case.

Summary

Use scipy.stats.<distribution>.cdf(...) for theoretical distributions.
Use sorting and cumulative proportions for an empirical CDF.
A simple np.mean(samples <= x) computes the empirical CDF at one point.
Step plots are a natural way to visualize empirical CDFs.
Pick the method based on whether you know the distribution or only have sample data.