Caffe
debug_info
log interpretation
machine learning
deep learning

How to interpret caffe log with debug_info?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When debug_info: true is enabled in a Caffe solver, the training log becomes much more detailed than the usual iteration-loss output. The extra lines are useful because they reveal what each layer is producing and how gradients flow backward, which helps diagnose exploding activations, dead layers, shape mistakes, and unstable optimization.

What debug_info Adds

Without debug info, a Caffe log mainly shows iteration numbers, learning rate, and loss values. With debug info enabled, Caffe also prints per-layer information during forward and backward passes.

You will often see lines that describe:

  • top blob values or statistics
  • parameter diffs during backpropagation
  • per-layer forward loss contributions
  • gradient magnitudes for weights and biases

The exact wording varies by layer type and Caffe build, but the pattern is consistent: forward pass lines explain activations, and backward pass lines explain gradients.

Reading the Log in Sections

A useful way to read a debug log is to split it into four parts:

  1. iteration header
  2. forward pass
  3. loss reporting
  4. backward pass

Conceptually, it looks like:

text
1Iteration 100, loss = 1.84
2Train net output #0: loss = 1.84
3Forward layer conv1 ...
4Backward layer conv1 ...

The forward section tells you what the network computed. The backward section tells you how the error signal is flowing back for learning.

What to Look for in Forward Passes

Forward-pass debug lines help answer questions such as:

  • are activations all zero
  • are they extremely large
  • does a layer output the expected shape

For example, if a ReLU layer outputs almost all zeros across many iterations, you may have a dead activation problem caused by initialization, learning rate, or bad input scaling.

If values explode in early layers, later layers and loss terms often become unstable too. Debug info helps you locate where the range first becomes unreasonable.

What to Look for in Backward Passes

Backward debug lines usually matter even more. They help detect:

  • vanishing gradients
  • exploding gradients
  • parameters that never update
  • branches that are disconnected from the loss

If a layer's parameter diffs stay near zero for a long time, that layer may not be learning meaningfully. If diffs are extremely large, optimization may be unstable and the learning rate may be too high.

This is especially useful in custom networks where a wrong bottom or top connection can silently prevent gradient flow.

A Practical Interpretation Pattern

Suppose training loss is not improving. Read the log in this order:

  1. confirm the loss changes over iterations
  2. inspect forward outputs in the earliest suspicious layer
  3. inspect backward diffs for that layer and the next few parameterized layers
  4. compare whether the signal dies, explodes, or stays healthy

If activations look normal but gradients vanish later, the issue may be architecture or saturation. If gradients are healthy but the loss is noisy and unstable, the optimizer or learning rate may be the problem.

Typical Signals and Their Meaning

Some rough interpretations are:

  • activations all near zero: dead units, bad scaling, or strong regularization
  • very large activations: unstable initialization or learning rate
  • parameter diffs always zero: disconnected gradient path or frozen parameters
  • loss spikes with huge diffs: optimizer instability

These are clues, not guarantees. The log gives evidence, and you still need to connect that evidence to the network design.

Common Pitfalls

One common mistake is staring only at the total loss and ignoring layer-level debug output. The total loss tells you that something is wrong, but not where.

Another mistake is overreacting to one noisy iteration. Deep learning logs fluctuate. Look for trends across many iterations rather than single surprising values.

People also confuse small gradients with healthy convergence. Sometimes the network is converging; other times it is stuck. The surrounding context, loss trend, and activation behavior matter.

Finally, debug logs are verbose enough to slow down training and overwhelm storage. Use them for diagnosis, not as a permanent default setting in long training runs.

Summary

  • 'debug_info adds per-layer forward and backward details to Caffe training logs.'
  • Forward-pass lines help diagnose activation scale and shape problems.
  • Backward-pass lines help diagnose vanishing, exploding, or missing gradients.
  • Read the log as a flow: iteration, forward, loss, backward.
  • Use debug logs selectively to locate failure points, not just to collect more text.

Course illustration
Course illustration

All Rights Reserved.