Default Adam optimizer doesn't work in tf.keras but string adam does
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
The issue where tf.keras.optimizers.Adam() behaves differently from the string "adam" in model.compile() is caused by TensorFlow having two different Adam implementations. In TF 2.11+, tf.keras.optimizers.Adam points to the new experimental optimizer, while the string "adam" resolves to the legacy tf.keras.optimizers.legacy.Adam. The new optimizer has different default behavior (especially around learning rate schedules and variable handling) that can cause training to fail or produce poor results.
The Problem
Both should be equivalent, but in TF 2.11-2.14, they resolve to different implementations.
Why It Happens
TensorFlow 2.11 introduced new optimizer classes under tf.keras.optimizers. The old optimizers were moved to tf.keras.optimizers.legacy:
Fix 1: Use the Legacy Optimizer
This gives you the same optimizer that the string "adam" resolves to.
Fix 2: Use the String
Fix 3: Set the Learning Rate Explicitly
The new Adam optimizer has different default learning rate scheduling. Setting the learning rate explicitly often fixes training issues:
Fix 4: Use Environment Variable (TF 2.11-2.14)
Or set the environment variable before running:
Fix 5: Upgrade to TF 2.15+
In TF 2.15+, the new optimizers are mature and stable. The legacy namespace was removed. Upgrading resolves the inconsistency:
Differences Between Old and New Adam
| Feature | Legacy Adam | New Adam (TF 2.11+) |
| Weight decay | Not built-in | Built-in via weight_decay param |
| EMA (Exponential Moving Average) | Not available | Built-in via use_ema param |
| Learning rate schedule | Separate LearningRateSchedule | Integrated |
| Gradient clipping | Manual via clipnorm/clipvalue | Same |
| Mixed precision | Works | Improved support |
| Save/Load | Stable | Different checkpoint format |
Checking Which Optimizer You Have
Model Save/Load Compatibility
To avoid issues, save with the same optimizer version you plan to load with.
Common Pitfalls
- Assuming string and class are identical: In TF 2.11-2.14,
"adam"andtf.keras.optimizers.Adam()are different classes. Always verify which implementation you are using withtype(optimizer). - Mixing legacy and new optimizers in a project: Using legacy Adam for some models and new Adam for others causes confusion and checkpoint incompatibility. Pick one and be consistent.
- Ignoring the TF_USE_LEGACY_KERAS variable: This environment variable must be set BEFORE importing TensorFlow. Setting it after
import tfhas no effect. - Weight decay confusion: The new Adam with
weight_decayapplies decoupled weight decay (AdamW), which is different from L2 regularization. Using bothweight_decayin the optimizer andkernel_regularizer=l2()in layers applies regularization twice. - Custom training loops: The new optimizer has a different API for
apply_gradients. Code that accesses internal optimizer state (likeoptimizer._decayed_lr) may break when switching from legacy to new.
Summary
- TF 2.11+ has two Adam implementations: new (
tf.keras.optimizers.Adam) and legacy (tf.keras.optimizers.legacy.Adam) - The string
"adam"resolves to legacy in TF 2.11-2.14, causing a mismatch with the class - Use
tf.keras.optimizers.legacy.Adam()or the string"adam"for backward-compatible behavior - Upgrade to TF 2.15+ where the inconsistency is resolved
- The new Adam adds weight decay and EMA support but has a different checkpoint format

