Bayesian networks tutorial
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A Bayesian network is a compact way to describe uncertainty when several variables influence one another. Instead of storing one huge probability table for every possible combination, it uses a directed acyclic graph plus local conditional probabilities to represent the same information much more efficiently.
What a Bayesian Network Represents
A Bayesian network has two parts.
- A graph, where each node is a random variable.
- A conditional probability table for each node.
An arrow means that one variable directly helps explain another. If Rain points to WetGrass, the model is saying that the chance of wet grass depends directly on whether it rained.
The graph must be acyclic. In practice, that means you can follow arrows forward, but you can never loop back to where you started. This matters because the model is built from a factorization order, and cycles break that structure.
A common beginner mistake is to think the arrows always represent physical causality. They often do, but not always. In a Bayesian network, an edge mainly says that one probability is conditioned on another variable.
Factorization and Conditional Independence
The main value of a Bayesian network is that it turns a large joint distribution into smaller local pieces.
For a chain A -> B -> C, the joint distribution becomes:
That factorization says something important: once B is known, C does not need to look directly at A. This is a conditional independence statement. Bayesian networks are useful because the graph makes those statements explicit.
Here is a more familiar example from alarm systems:
That network factorizes as:
Without the network structure, you would need to store probabilities for every full combination of values. With the network, each variable only needs a local rule based on its parents.
A Small Worked Example
Consider a simple weather model:
CloudyinfluencesSprinklerCloudyinfluencesRainSprinklerandRaininfluenceWetGrass
The graph is:
We can compute probabilities by multiplying local conditional probabilities. The following Python example implements exact inference by enumeration and asks for the probability of rain given that the grass is wet.
This example is intentionally small, but it shows the core workflow:
- Define a directed acyclic graph.
- Attach a conditional probability table to each node.
- Multiply local probabilities to form joint probabilities.
- Sum over hidden variables to answer queries.
For larger networks, exact enumeration becomes expensive, which is why real systems use optimized inference algorithms.
Inference in Practice
Inference means answering probability questions from partial evidence. Typical questions include:
- What is the probability of a disease given a set of symptoms?
- What is the probability a machine has failed given several sensor readings?
- Which hidden cause is most plausible after we observe several effects?
Exact methods include enumeration and variable elimination. Approximate methods include sampling approaches such as Gibbs sampling and likelihood weighting.
When Bayesian Networks Are Useful
Bayesian networks work well when:
- uncertainty is central to the problem
- you understand the dependency structure reasonably well
- interpretability matters
They are common in medical diagnosis, risk analysis, reliability modeling, and expert systems.
Common Pitfalls
One common pitfall is adding too many edges. That makes the model harder to estimate and removes useful independence assumptions.
Another issue is choosing the wrong direction for relationships without understanding the consequence. Different graph structures can encode different independence assumptions, so changing an arrow is not just cosmetic.
A third problem is filling probability tables with inconsistent numbers. Every conditional distribution must sum to 1 for each parent configuration. Small bookkeeping mistakes can make the whole model invalid.
Finally, many people underestimate inference cost. Even if the graph looks simple on paper, exact inference can become expensive when the network has many nodes with many parents. That is where model design matters as much as raw probability theory.
Summary
- A Bayesian network combines a directed acyclic graph with local conditional probability tables.
- Its main strength is compactly factorizing a joint distribution.
- The graph encodes conditional independence assumptions.
- Inference answers probability queries from partial evidence.
- Small networks are easy to reason about, but large ones require careful structure and efficient inference methods.

