machine learning
pytorch
runtime error
neural networks
debugging

RuntimeError Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graphTrue

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

This PyTorch error means you are trying to reuse a computation graph after backward() has already consumed and freed the saved intermediate values needed for gradient calculation. In most training code, the fix is not "always add retain_graph=True." The real fix is to understand why the graph is being reused and choose the right pattern for that situation.

Why PyTorch Frees the Graph

PyTorch builds a dynamic computation graph during the forward pass. When you call loss.backward(), autograd walks that graph in reverse, computes gradients, and then frees saved buffers to conserve memory.

That is why this works once:

python
1import torch
2
3x = torch.tensor(2.0, requires_grad=True)
4y = x * x
5y.backward()
6
7print(x.grad)

But this fails:

python
y.backward()

After the first backward pass, PyTorch has already released the information it needed from that graph.

The Most Common Cause

The error usually appears when code accidentally calls backward() twice on values produced by the same forward pass.

python
1import torch
2
3x = torch.tensor(3.0, requires_grad=True)
4loss = (x * 2) ** 2
5
6loss.backward()
7loss.backward()

The second call fails because loss still points to the old graph, and that graph is no longer intact.

Correct Fix 1: Recompute the Forward Pass

In normal training, the usual fix is to run the forward computation again before the next backward call.

python
1import torch
2
3x = torch.tensor(3.0, requires_grad=True)
4
5loss = (x * 2) ** 2
6loss.backward()
7
8x.grad.zero_()
9
10loss = (x * 2) ** 2
11loss.backward()
12
13print(x.grad)

This is the standard pattern because each optimization step should normally build a fresh graph from a fresh forward pass.

Correct Fix 2: Use retain_graph=True Only When You Truly Need It

Sometimes you intentionally need more than one backward pass through the same graph. In that case, ask PyTorch to retain it on the earlier backward call.

python
1import torch
2
3x = torch.tensor(2.0, requires_grad=True)
4y = x * x
5
6y.backward(retain_graph=True)
7print("After first backward:", x.grad)
8
9y.backward()
10print("After second backward:", x.grad)

This works, but it is not a free solution. Retaining the graph uses more memory, so it should be reserved for cases where a second traversal is actually necessary.

Common examples include:

  • multiple losses sharing the same forward graph
  • higher-order gradient calculations
  • custom research code that intentionally reuses graph structure

Correct Fix 3: Detach State That Should Not Keep History

Another common source of this error is carrying tensors with gradient history across loop iterations when you really meant to treat them as plain values.

python
1import torch
2
3hidden = torch.zeros(1, requires_grad=True)
4
5for _ in range(3):
6    hidden = hidden.detach()
7    hidden.requires_grad_()
8
9    loss = (hidden + 1).sum()
10    loss.backward()

Detaching breaks the history chain. This is especially important in recurrent models, custom optimization loops, and stateful training code where one step should not backpropagate through all previous steps accidentally.

Multiple Losses on the Same Forward Pass

Suppose you compute two losses from one forward pass:

python
1import torch
2
3x = torch.tensor(2.0, requires_grad=True)
4shared = x * 3
5loss1 = shared ** 2
6loss2 = shared + 5
7
8loss1.backward(retain_graph=True)
9loss2.backward()

This is valid because the first backward keeps the graph alive for the second one.

An even cleaner approach is often to combine the losses and call backward() once:

python
total_loss = loss1 + loss2
total_loss.backward()

That is usually simpler and more memory-efficient.

When the Problem Is an Accumulating Tensor

Another easy trap is doing something like this in a loop:

python
1running_loss = 0
2
3for batch in data:
4    loss = model(batch).sum()
5    running_loss += loss

If running_loss keeps collecting tensors that still belong to autograd graphs, you may accidentally hold onto old graphs or reuse them unexpectedly. Often the correct pattern is:

python
running_loss += loss.item()

That converts the tensor to a plain Python number for logging.

Common Pitfalls

One common mistake is adding retain_graph=True everywhere to silence the error. That often hides the real bug and wastes memory.

Another issue is forgetting that a fresh training iteration normally needs a fresh forward pass before the next backward call.

Developers also often hit this error when reusing hidden state or cached tensors from a previous iteration without detaching them first.

Finally, logging tensors instead of scalar values can accidentally keep graph references alive much longer than intended. Use .item() when you only need the number.

Summary

  • The error means autograd is being asked to traverse a graph that has already been freed.
  • The normal fix is to recompute the forward pass before calling backward() again.
  • Use retain_graph=True only when multiple backward passes through the same graph are truly required.
  • Use detach() when state should not carry gradient history into later iterations.
  • Prefer one combined loss and one backward call when possible.

Course illustration
Course illustration

All Rights Reserved.