Get node list from random walk in networkX

networkX

random walk

graph analysis

Python programming

node list

Get node list from random walk in networkX

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A random walk in a graph is an ordered list of visited nodes, not just a set of nodes. In NetworkX, this list is often the direct input for sampling, recommendation experiments, or graph embedding pipelines. A good implementation should be reproducible, explicit about directed behavior, and clear about what happens at dead ends.

Build a Basic Random Walk Function

Start with a simple helper that takes a start node and step count, then chooses one neighbor at each step. Return the path list so downstream code can compute transitions and visit frequencies.

python

1import random
2import networkx as nx
3from typing import Hashable
4
5
6def random_walk_nodes(
7    graph: nx.Graph,
8    start: Hashable,
9    steps: int,
10    seed: int | None = None,
11) -> list[Hashable]:
12    if start not in graph:
13        raise ValueError("start node not found")
14    if steps < 0:
15        raise ValueError("steps must be non-negative")
16
17    rng = random.Random(seed)
18    path = [start]
19    current = start
20
21    for _ in range(steps):
22        neighbors = list(graph.neighbors(current))
23        if not neighbors:
24            break
25        current = rng.choice(neighbors)
26        path.append(current)
27
28    return path
29
30
31G = nx.path_graph(6)
32print(random_walk_nodes(G, start=0, steps=10, seed=42))

With a fixed seed, output is stable and easier to debug.

Handle Directed Graphs Explicitly

For directed graphs, use outgoing neighbors. A node can become a dead end if it has no successors. You should define a dead-end policy instead of leaving behavior implicit.

Typical policies:

stop the walk
restart from the original node
teleport to a random node

python

1import random
2import networkx as nx
3
4
5def directed_walk(
6    graph: nx.DiGraph,
7    start,
8    steps: int,
9    seed: int | None = None,
10    dead_end_policy: str = "stop",
11):
12    rng = random.Random(seed)
13    path = [start]
14    current = start
15    all_nodes = list(graph.nodes())
16
17    for _ in range(steps):
18        out_neighbors = list(graph.successors(current))
19
20        if not out_neighbors:
21            if dead_end_policy == "stop":
22                break
23            if dead_end_policy == "restart":
24                current = start
25                path.append(current)
26                continue
27            if dead_end_policy == "teleport":
28                current = rng.choice(all_nodes)
29                path.append(current)
30                continue
31            raise ValueError("unsupported dead_end_policy")
32
33        current = rng.choice(out_neighbors)
34        path.append(current)
35
36    return path

Document the policy because it changes path statistics significantly.

Add Weighted Transition Support

Uniform neighbor selection is not always correct. If edge weights represent transition preference, sample with weighted probabilities.

python

1import random
2import networkx as nx
3
4
5def weighted_walk(graph: nx.DiGraph, start, steps: int, seed: int | None = None):
6    rng = random.Random(seed)
7    current = start
8    path = [current]
9
10    for _ in range(steps):
11        edges = list(graph.out_edges(current, data=True))
12        if not edges:
13            break
14
15        candidates = [v for _, v, _ in edges]
16        weights = [data.get("weight", 1.0) for _, _, data in edges]
17
18        if sum(weights) <= 0:
19            break
20
21        current = rng.choices(candidates, weights=weights, k=1)[0]
22        path.append(current)
23
24    return path
25
26
27DG = nx.DiGraph()
28DG.add_weighted_edges_from([
29    ("A", "B", 0.8),
30    ("A", "C", 0.2),
31    ("B", "A", 1.0),
32    ("C", "A", 1.0),
33])
34
35print(weighted_walk(DG, "A", 12, seed=7))

This approach is common in user navigation and sequence simulation tasks.

Analyze the Node List Output

Once you have path output, compute metrics to validate behavior.

python

1from collections import Counter
2
3path = random_walk_nodes(G, start=0, steps=200, seed=123)
4visit_counts = Counter(path)
5transition_counts = Counter(zip(path, path[1:]))
6
7print("unique visited:", len(visit_counts))
8print("top nodes:", visit_counts.most_common(3))
9print("top transitions:", transition_counts.most_common(3))

Store graph version and seed with results. Otherwise, comparisons between runs can be misleading.

For embedding-style workloads, generate many walks per start node and keep each walk as a separate list.

python

1def generate_walk_corpus(graph: nx.Graph, starts: list[int], walks_per_node: int, length: int):
2    corpus = []
3    seed = 1000
4
5    for node in starts:
6        for _ in range(walks_per_node):
7            corpus.append(random_walk_nodes(graph, node, length, seed=seed))
8            seed += 1
9
10    return corpus

This corpus structure integrates directly with sequence-based training pipelines.

Common Pitfalls

A common pitfall is returning only unique visited nodes, which loses ordering and transition information. Another is forgetting to seed the random generator during experiments, making results hard to reproduce. Directed graphs are often treated with undirected neighbor logic, causing invalid transitions. Dead-end handling is frequently ignored and silently truncates walks. Weighted walks also fail when missing weight keys are not handled consistently.

Summary

Return an ordered node path for each random walk.
Use explicit dead-end policy, especially for directed graphs.
Add random seed support for reproducible experiments.
Use weighted sampling when edge weights have semantic meaning.
Compute visit and transition statistics to validate walk behavior.
Store config metadata with outputs for reliable comparison.