Creating a random tree?

random tree generation

algorithm design

data structures

tree algorithms

computer science

Creating a random tree?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Creating a random tree sounds simple until you decide what "random" should mean. Different generation methods produce very different distributions, so the right algorithm depends on whether you want a quick arbitrary tree, a random rooted tree, or a uniformly random labeled tree.

Decide What Kind Of Randomness You Need

A tree can be random in several senses:

each new node chooses a random parent from earlier nodes
each labeled tree on n nodes is equally likely
a random spanning tree is drawn from a larger graph

Those are not equivalent. If you do not define the distribution first, it is easy to implement something random-looking but mathematically biased.

Simple Parent-Choice Method

A very easy generator for a rooted tree on nodes 0..n-1 is:

start with node 0 as the root
for each new node i, choose a parent uniformly from 0..i-1

python

1import random
2
3
4def random_tree(n, seed=None):
5    rng = random.Random(seed)
6    parents = [-1]
7    for i in range(1, n):
8        parents.append(rng.randrange(i))
9    return parents
10
11
12print(random_tree(8, seed=42))

This is fast and useful for testing, but it does not generate all labeled trees with equal probability.

Uniform Random Labeled Trees With Prüfer Codes

If you want every labeled tree on n vertices to be equally likely, Prüfer sequences are the standard method. A labeled tree on n nodes corresponds uniquely to a Prüfer code of length n - 2, where each entry is an integer from 0 to n - 1.

So the algorithm is:

generate a random Prüfer sequence of length n - 2
decode it into a tree

python

1import heapq
2import random
3
4
5def random_prufer_tree(n, seed=None):
6    rng = random.Random(seed)
7    code = [rng.randrange(n) for _ in range(n - 2)]
8    degree = [1] * n
9
10    for x in code:
11        degree[x] += 1
12
13    leaves = [i for i in range(n) if degree[i] == 1]
14    heapq.heapify(leaves)
15    edges = []
16
17    for x in code:
18        leaf = heapq.heappop(leaves)
19        edges.append((leaf, x))
20        degree[leaf] -= 1
21        degree[x] -= 1
22        if degree[x] == 1:
23            heapq.heappush(leaves, x)
24
25    a = heapq.heappop(leaves)
26    b = heapq.heappop(leaves)
27    edges.append((a, b))
28    return edges
29
30
31print(random_prufer_tree(6, seed=7))

This method gives a uniform random labeled tree, which is usually what people mean in combinatorics.

Random Spanning Tree On A Graph

If you are not generating from scratch but from an existing graph, the goal is often a random spanning tree. In that setting, algorithms such as Wilson’s algorithm or Aldous-Broder are more appropriate. They sample spanning trees of the given graph rather than arbitrary trees on n labeled nodes.

That distinction matters because the surrounding graph constrains which trees are even possible.

Representing The Result

A random tree can be returned in several formats:

parent array
edge list
adjacency list

Choose the one that matches the next step of your application. A parent array is compact for rooted trees, while an edge list is more general and works well for graph algorithms.

Common Pitfalls

The most common mistake is saying "random tree" without defining the target distribution. A random parent generator and a uniform labeled-tree generator are not the same thing.

Another mistake is using the simple parent-choice method when theoretical uniformity matters. It is fine for simulation or test data, but not for combinatorial experiments that assume equal probability across trees.

A third issue is confusing rooted trees with unrooted labeled trees. The output representation changes the interpretation of the randomness.

Summary

"Random tree" is ambiguous until you define the probability model.
Choosing a random parent for each new node is simple but biased.
Prüfer codes generate uniformly random labeled trees.
Random spanning trees on existing graphs require different algorithms.
Pick a representation such as parent array or edge list based on what comes next.