RuntimeError Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graphTrue
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
This PyTorch error means you are trying to reuse a computation graph after backward() has already consumed and freed the saved intermediate values needed for gradient calculation. In most training code, the fix is not "always add retain_graph=True." The real fix is to understand why the graph is being reused and choose the right pattern for that situation.
Why PyTorch Frees the Graph
PyTorch builds a dynamic computation graph during the forward pass. When you call loss.backward(), autograd walks that graph in reverse, computes gradients, and then frees saved buffers to conserve memory.
That is why this works once:
But this fails:
After the first backward pass, PyTorch has already released the information it needed from that graph.
The Most Common Cause
The error usually appears when code accidentally calls backward() twice on values produced by the same forward pass.
The second call fails because loss still points to the old graph, and that graph is no longer intact.
Correct Fix 1: Recompute the Forward Pass
In normal training, the usual fix is to run the forward computation again before the next backward call.
This is the standard pattern because each optimization step should normally build a fresh graph from a fresh forward pass.
Correct Fix 2: Use retain_graph=True Only When You Truly Need It
Sometimes you intentionally need more than one backward pass through the same graph. In that case, ask PyTorch to retain it on the earlier backward call.
This works, but it is not a free solution. Retaining the graph uses more memory, so it should be reserved for cases where a second traversal is actually necessary.
Common examples include:
- multiple losses sharing the same forward graph
- higher-order gradient calculations
- custom research code that intentionally reuses graph structure
Correct Fix 3: Detach State That Should Not Keep History
Another common source of this error is carrying tensors with gradient history across loop iterations when you really meant to treat them as plain values.
Detaching breaks the history chain. This is especially important in recurrent models, custom optimization loops, and stateful training code where one step should not backpropagate through all previous steps accidentally.
Multiple Losses on the Same Forward Pass
Suppose you compute two losses from one forward pass:
This is valid because the first backward keeps the graph alive for the second one.
An even cleaner approach is often to combine the losses and call backward() once:
That is usually simpler and more memory-efficient.
When the Problem Is an Accumulating Tensor
Another easy trap is doing something like this in a loop:
If running_loss keeps collecting tensors that still belong to autograd graphs, you may accidentally hold onto old graphs or reuse them unexpectedly. Often the correct pattern is:
That converts the tensor to a plain Python number for logging.
Common Pitfalls
One common mistake is adding retain_graph=True everywhere to silence the error. That often hides the real bug and wastes memory.
Another issue is forgetting that a fresh training iteration normally needs a fresh forward pass before the next backward call.
Developers also often hit this error when reusing hidden state or cached tensors from a previous iteration without detaching them first.
Finally, logging tensors instead of scalar values can accidentally keep graph references alive much longer than intended. Use .item() when you only need the number.
Summary
- The error means autograd is being asked to traverse a graph that has already been freed.
- The normal fix is to recompute the forward pass before calling
backward()again. - Use
retain_graph=Trueonly when multiple backward passes through the same graph are truly required. - Use
detach()when state should not carry gradient history into later iterations. - Prefer one combined loss and one backward call when possible.

