Calculate the Cumulative Distribution Function CDF in Python
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In Python, calculating a CDF depends on what kind of distribution you have. If you know the theoretical distribution, use a statistics library such as SciPy to evaluate the distribution's cdf function directly. If you only have observed data, build an empirical CDF from the sorted sample.
Parametric CDF with SciPy
For a known distribution such as the normal distribution, SciPy gives you the CDF directly.
This returns the probability that a normal random variable with mean 0 and standard deviation 1 is less than or equal to 1.5.
The same pattern works for many other distributions in scipy.stats, such as expon, binom, or poisson. The main idea is always the same: choose the distribution object and call its cdf method with the relevant parameters.
Empirical CDF from Sample Data
If you do not want to assume a theoretical distribution, compute an empirical CDF from observed values.
Each point in cdf_values is the fraction of observations less than or equal to the corresponding sorted sample value. This is a direct, non-parametric description of the data.
If you want to evaluate the empirical CDF at a specific query point, count how many samples are less than or equal to that point.
That is the simplest possible empirical CDF calculation.
Plot the CDF
A CDF is often easier to understand visually than numerically. For an empirical CDF, a step plot is usually the clearest representation.
For a theoretical distribution, you can generate a grid of x values and call the distribution's cdf method across the grid.
This distinction between theoretical and empirical CDFs matters a lot in analysis. A theoretical CDF assumes a model, such as normal or exponential, while an empirical CDF is just a summary of what the sample actually contains. One is model-based, the other is data-based.
Choosing the wrong one can turn a simple probability question into a modeling mistake.
That is why good statistical code starts by deciding whether the problem is about a known distribution or about observed data only.
The CDF formula may be simple, but the modeling choice behind it is not.
Common Pitfalls
- Mixing up the CDF with the PDF or PMF.
- Using a theoretical distribution when the data should be handled empirically.
- Forgetting that the CDF is the probability of being less than or equal to a value.
- Comparing empirical and theoretical CDFs without checking parameter assumptions.
- Plotting noisy data without sorting it first for the empirical case.
Summary
- Use
scipy.stats.<distribution>.cdf(...)for theoretical distributions. - Use sorting and cumulative proportions for an empirical CDF.
- A simple
np.mean(samples <= x)computes the empirical CDF at one point. - Step plots are a natural way to visualize empirical CDFs.
- Pick the method based on whether you know the distribution or only have sample data.

