Discrete Bayesian network on Tensorflow Probability, Edward2, and Python
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A discrete Bayesian network is a directed acyclic graph whose nodes are discrete random variables and whose edges encode conditional dependence. In TensorFlow Probability, the most practical way to express such a model is usually with a joint distribution built from Bernoulli or categorical distributions; Edward2-style probabilistic programming can describe the same structure, but the core modeling idea is the same.
A Tiny Discrete Bayesian Network
Consider a classic three-node network:
- '
Rain' - '
Sprinkler, which depends onRain' - '
WetGrass, which depends on bothRainandSprinkler'
This is a discrete Bayesian network because all variables are categorical or Bernoulli and the graph is acyclic.
A TensorFlow Probability Version
Here is a simple TFP model using JointDistributionCoroutineAutoBatched.
This is already a valid discrete Bayesian network. Each node is a distribution conditioned on its parent values.
Where Edward2 Fits
Edward2 historically offered a traced-random-variable style for writing probabilistic programs. The modeling idea is still the same: you create random variables in dependency order and let the program structure represent the graph.
Conceptually, Edward2 is helpful when you want a probabilistic-programming style where random variables are first-class objects. But for a small discrete Bayesian network, a TFP joint distribution is often the most straightforward representation.
So the practical distinction is less about mathematics and more about API style.
Conditioning and Inference
Defining the network is only the first step. The next question is usually inference: given observations, what are the posterior probabilities of the unobserved nodes?
For a small discrete network, one practical approach is brute-force enumeration of all assignments. For example, you can evaluate the joint log-probability of every state and normalize manually.
For tiny networks, this is easy to understand and often clearer than introducing a full approximate inference stack immediately.
Learning Parameters vs Encoding Them Manually
The example above hardcodes the conditional probability tables. That is fine when you are building a known model by hand. If you want to learn the CPT values from data, then the CPT entries become parameters, and you need an optimization or Bayesian inference step on top of the network definition.
That is an important design split:
- structure known, probabilities known: just encode the network
- structure known, probabilities unknown: fit the CPT parameters
- structure unknown: structure learning becomes a separate problem entirely
Common Pitfalls
The biggest pitfall is thinking TensorFlow Probability will infer the network structure automatically from Python control flow. You still need to define the conditional distributions explicitly.
Another common mistake is using continuous distributions by habit when the problem is genuinely discrete. For a discrete Bayesian network, Bernoulli and categorical distributions are usually the right building blocks.
Developers also jump straight to advanced inference machinery when exact enumeration would be simpler and clearer for a small toy model.
Summary
- A discrete Bayesian network is a DAG of discrete conditional distributions.
- In TensorFlow Probability, a joint distribution built from Bernoulli or categorical nodes is a clean way to represent it.
- Edward2 expresses the same kind of model in a different probabilistic-programming style.
- Small discrete networks can often be analyzed with direct enumeration before using heavier inference tools.
- Separate clearly whether you are encoding known CPTs or trying to learn them from data.

