Discrete Bayesian network on Tensorflow Probability, Edward2, and Python

Bayesian Networks

Tensorflow Probability

Edward2

Python

Machine Learning

Discrete Bayesian network on Tensorflow Probability, Edward2, and Python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A discrete Bayesian network is a directed acyclic graph whose nodes are discrete random variables and whose edges encode conditional dependence. In TensorFlow Probability, the most practical way to express such a model is usually with a joint distribution built from Bernoulli or categorical distributions; Edward2-style probabilistic programming can describe the same structure, but the core modeling idea is the same.

A Tiny Discrete Bayesian Network

Consider a classic three-node network:

'Rain'
'Sprinkler, which depends on Rain'
'WetGrass, which depends on both Rain and Sprinkler'

This is a discrete Bayesian network because all variables are categorical or Bernoulli and the graph is acyclic.

A TensorFlow Probability Version

Here is a simple TFP model using JointDistributionCoroutineAutoBatched.

python

1import tensorflow as tf
2import tensorflow_probability as tfp
3
4tfd = tfp.distributions
5
6
7def model():
8    rain = yield tfd.Bernoulli(probs=0.3, dtype=tf.int32, name="rain")
9    sprinkler = yield tfd.Bernoulli(
10        probs=tf.where(tf.equal(rain, 1), 0.1, 0.5),
11        dtype=tf.int32,
12        name="sprinkler",
13    )
14
15    wet_probs = tf.where(
16        tf.equal(rain, 1),
17        tf.where(tf.equal(sprinkler, 1), 0.99, 0.8),
18        tf.where(tf.equal(sprinkler, 1), 0.9, 0.0),
19    )
20
21    yield tfd.Bernoulli(probs=wet_probs, dtype=tf.int32, name="wet_grass")
22
23
24joint = tfd.JointDistributionCoroutineAutoBatched(model)
25
26sample = joint.sample()
27print(sample)
28print(joint.log_prob(sample).numpy())

This is already a valid discrete Bayesian network. Each node is a distribution conditioned on its parent values.

Where Edward2 Fits

Edward2 historically offered a traced-random-variable style for writing probabilistic programs. The modeling idea is still the same: you create random variables in dependency order and let the program structure represent the graph.

Conceptually, Edward2 is helpful when you want a probabilistic-programming style where random variables are first-class objects. But for a small discrete Bayesian network, a TFP joint distribution is often the most straightforward representation.

So the practical distinction is less about mathematics and more about API style.

Conditioning and Inference

Defining the network is only the first step. The next question is usually inference: given observations, what are the posterior probabilities of the unobserved nodes?

For a small discrete network, one practical approach is brute-force enumeration of all assignments. For example, you can evaluate the joint log-probability of every state and normalize manually.

python

1states = []
2for rain in [0, 1]:
3    for sprinkler in [0, 1]:
4        for wet_grass in [0, 1]:
5            state = [rain, sprinkler, wet_grass]
6            logp = joint.log_prob(state).numpy()
7            states.append((state, logp))
8
9for state, logp in states:
10    print(state, logp)

For tiny networks, this is easy to understand and often clearer than introducing a full approximate inference stack immediately.

Learning Parameters vs Encoding Them Manually

The example above hardcodes the conditional probability tables. That is fine when you are building a known model by hand. If you want to learn the CPT values from data, then the CPT entries become parameters, and you need an optimization or Bayesian inference step on top of the network definition.

That is an important design split:

structure known, probabilities known: just encode the network
structure known, probabilities unknown: fit the CPT parameters
structure unknown: structure learning becomes a separate problem entirely

Common Pitfalls

The biggest pitfall is thinking TensorFlow Probability will infer the network structure automatically from Python control flow. You still need to define the conditional distributions explicitly.

Another common mistake is using continuous distributions by habit when the problem is genuinely discrete. For a discrete Bayesian network, Bernoulli and categorical distributions are usually the right building blocks.

Developers also jump straight to advanced inference machinery when exact enumeration would be simpler and clearer for a small toy model.

Summary

A discrete Bayesian network is a DAG of discrete conditional distributions.
In TensorFlow Probability, a joint distribution built from Bernoulli or categorical nodes is a clean way to represent it.
Edward2 expresses the same kind of model in a different probabilistic-programming style.
Small discrete networks can often be analyzed with direct enumeration before using heavier inference tools.
Separate clearly whether you are encoding known CPTs or trying to learn them from data.