Kevin Bacon number
Six degrees of separation
Network theory
Actor connectivity
Film industry links

Calculating Kevin Bacon Numbers

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

A Kevin Bacon number measures the shortest chain of co-starring relationships between any actor and Kevin Bacon. Kevin Bacon himself has a Bacon number of 0, an actor who appeared in a film with him has a number of 1, an actor connected through one intermediary has a number of 2, and so on. Computing Bacon numbers is a classic application of Breadth-First Search (BFS) on an undirected graph, and the same algorithm generalizes to any "degrees of separation" problem in network analysis.

How Kevin Bacon Numbers Are Defined

The rules are straightforward:

  • Kevin Bacon has a Bacon number of 0
  • An actor who appeared in a movie with Kevin Bacon has a Bacon number of 1
  • An actor who appeared in a movie with a Bacon-1 actor (but not directly with Bacon) has a Bacon number of 2
  • The pattern continues for higher numbers
  • An actor with no connection to Kevin Bacon has a Bacon number of infinity (undefined)

For example:

ActorConnection PathBacon Number
Kevin Bacon(self)0
Tom HanksApollo 13 (with Bacon)1
Robin WrightForrest Gump (with Hanks)2
Morgan FreemanSe7en (with Pitt), Sleepers (Pitt with Bacon)2

The average Bacon number across all actors in the Internet Movie Database is around 2.9, illustrating the "small world" property of the film industry network.

Graph Representation

The actor-movie relationship maps naturally to a graph:

  • Nodes: Each actor is a node
  • Edges: Two actors share an edge if they appeared in the same movie
  • Weight: All edges have equal weight (each co-starring link counts as 1)

This creates an undirected, unweighted graph. A movie with N actors creates edges between every pair, contributing N*(N-1)/2 edges for that film.

An alternative representation uses a bipartite graph with both actor nodes and movie nodes:

  • Actor nodes connect to movie nodes they appeared in
  • The Bacon number is half the BFS distance in the bipartite graph (since the path alternates actor-movie-actor-movie)

The bipartite approach is more memory-efficient for movies with large casts, since a movie node connects to N actors (N edges) rather than creating N*(N-1)/2 edges between all actor pairs.

BFS Algorithm for Computing Bacon Numbers

BFS is the standard algorithm for finding shortest paths in an unweighted graph. Starting from Kevin Bacon, it explores all actors at distance 1, then distance 2, and so on.

Python Implementation

python
1from collections import deque, defaultdict
2
3def build_actor_graph(movies):
4    """Build adjacency list from movie cast lists.
5    
6    Args:
7        movies: dict mapping movie name to list of actors
8    Returns:
9        dict mapping each actor to set of co-stars
10    """
11    graph = defaultdict(set)
12    for movie, cast in movies.items():
13        for i, actor in enumerate(cast):
14            for co_star in cast[i + 1:]:
15                graph[actor].add(co_star)
16                graph[co_star].add(actor)
17    return graph
18
19def compute_bacon_numbers(graph, source="Kevin Bacon"):
20    """Compute shortest distance from source to all reachable actors.
21    
22    Args:
23        graph: adjacency list (dict of sets)
24        source: starting actor
25    Returns:
26        dict mapping each actor to their Bacon number
27    """
28    distances = {source: 0}
29    queue = deque([source])
30    
31    while queue:
32        current = queue.popleft()
33        for neighbor in graph[current]:
34            if neighbor not in distances:
35                distances[neighbor] = distances[current] + 1
36                queue.append(neighbor)
37    
38    return distances
39
40# Example usage
41movies = {
42    "Apollo 13": ["Kevin Bacon", "Tom Hanks", "Bill Paxton"],
43    "Forrest Gump": ["Tom Hanks", "Robin Wright", "Gary Sinise"],
44    "A Few Good Men": ["Tom Cruise", "Jack Nicholson", "Kevin Bacon"],
45    "Top Gun": ["Tom Cruise", "Val Kilmer", "Kelly McGillis"],
46}
47
48graph = build_actor_graph(movies)
49bacon_numbers = compute_bacon_numbers(graph)
50
51for actor, number in sorted(bacon_numbers.items(), key=lambda x: x[1]):
52    print(f"{actor}: {number}")

Output:

 
1Kevin Bacon: 0
2Tom Hanks: 1
3Bill Paxton: 1
4Tom Cruise: 1
5Jack Nicholson: 1
6Robin Wright: 2
7Gary Sinise: 2
8Val Kilmer: 2
9Kelly McGillis: 2

Java Implementation

java
1import java.util.*;
2
3public class BaconNumbers {
4    
5    public static Map<String, Integer> computeBaconNumbers(
6            Map<String, Set<String>> graph, String source) {
7        Map<String, Integer> distances = new HashMap<>();
8        Queue<String> queue = new LinkedList<>();
9        
10        distances.put(source, 0);
11        queue.add(source);
12        
13        while (!queue.isEmpty()) {
14            String current = queue.poll();
15            int currentDist = distances.get(current);
16            
17            for (String neighbor : graph.getOrDefault(current, Set.of())) {
18                if (!distances.containsKey(neighbor)) {
19                    distances.put(neighbor, currentDist + 1);
20                    queue.add(neighbor);
21                }
22            }
23        }
24        return distances;
25    }
26    
27    public static void main(String[] args) {
28        Map<String, Set<String>> graph = new HashMap<>();
29        // Build graph from movie data...
30        
31        Map<String, Integer> numbers = computeBaconNumbers(graph, "Kevin Bacon");
32        numbers.forEach((actor, num) -> 
33            System.out.println(actor + ": " + num));
34    }
35}

Time and Space Complexity

AspectComplexityExplanation
Time (BFS)O(V + E)Visit every actor (V) and traverse every co-starring edge (E) once
Space (graph)O(V + E)Adjacency list storage
Space (BFS)O(V)Queue and distance map
Graph constructionO(sum of cast_size^2)Each movie creates edges between all actor pairs

For the full IMDB dataset (roughly 3 million actors and 500,000+ movies), the graph has hundreds of millions of edges. The BFS itself runs in seconds on modern hardware, but graph construction and memory are the real bottlenecks.

Optimizations for Large Datasets

Bipartite Graph Approach

Instead of creating edges between every actor pair in a movie, use movie nodes as intermediaries:

python
1def build_bipartite_graph(movies):
2    """Build bipartite actor-movie graph.
3    
4    Bacon number = BFS distance // 2 in the bipartite graph.
5    """
6    graph = defaultdict(set)
7    for movie, cast in movies.items():
8        movie_node = f"MOVIE:{movie}"
9        for actor in cast:
10            graph[actor].add(movie_node)
11            graph[movie_node].add(actor)
12    return graph
13
14def compute_bacon_numbers_bipartite(graph, source="Kevin Bacon"):
15    distances = {source: 0}
16    queue = deque([source])
17    
18    while queue:
19        current = queue.popleft()
20        for neighbor in graph[current]:
21            if neighbor not in distances:
22                distances[neighbor] = distances[current] + 1
23                queue.append(neighbor)
24    
25    # Filter out movie nodes and halve distances
26    return {
27        actor: dist // 2
28        for actor, dist in distances.items()
29        if not actor.startswith("MOVIE:")
30    }

This reduces memory from O(C^2) per movie (where C is cast size) to O(C) per movie.

Bidirectional BFS

When finding the Bacon number for a single target actor, bidirectional BFS searches from both Kevin Bacon and the target simultaneously, meeting in the middle. This reduces the explored space from O(b^d) to O(b^(d/2)), where b is the branching factor and d is the distance. The idea is to alternate BFS expansions from both ends and stop as soon as the two frontiers intersect.

Real-World Applications Beyond Movies

The same algorithm applies to any "degrees of separation" computation:

DomainNodesEdgesExample
Academic collaborationAuthorsCo-authored papersErdos numbers
Social networksUsersFriendships / followsLinkedIn connection degree
Protein interactionProteinsPhysical interactionsFunctional distance
Citation networksPapersCitationsIntellectual influence
Software dependenciesPackagesImport / dependencyTransitive dependency depth

The Erdos number (measuring distance to mathematician Paul Erdos through co-authored papers) predates the Bacon number and uses the identical BFS computation.

Common Pitfalls

  • Using DFS instead of BFS. Depth-First Search finds a path, but not necessarily the shortest path. BFS guarantees shortest path in unweighted graphs because it explores all nodes at distance k before any node at distance k+1.
  • Creating O(N^2) edges per movie. A movie with 100 cast members generates 4,950 edges in the direct graph. Use the bipartite representation for large datasets to avoid memory explosion.
  • Ignoring disconnected components. Not every actor is reachable from Kevin Bacon. Some actors appear only in films with entirely separate casts. Handle unreachable actors explicitly (return infinity or a sentinel value).
  • Double-counting in bipartite BFS. When using a bipartite graph, the raw BFS distance is twice the Bacon number because each step alternates between actor and movie nodes. Divide by 2 when reporting results.
  • Not handling self-loops or duplicate edges. The same pair of actors may co-star in multiple movies. Using a set-based adjacency list naturally deduplicates edges, but list-based representations will traverse redundant edges and waste time.

Summary

  • A Kevin Bacon number is the shortest co-starring path length from any actor to Kevin Bacon.
  • BFS on an undirected, unweighted graph is the standard algorithm, running in O(V + E) time.
  • The bipartite graph representation (actors and movies as separate node types) is more memory-efficient for large datasets.
  • Bidirectional BFS is faster for single-target queries, reducing explored space exponentially.
  • The same algorithm applies to Erdos numbers, social network degrees, and any shortest-path problem on unweighted graphs.
  • Always use BFS, never DFS, when the goal is the shortest path in an unweighted graph.

Course illustration
Course illustration

All Rights Reserved.