tensorflow
keras
adam optimizer
machine learning
deep learning

Default Adam optimizer doesn't work in tf.keras but string adam does

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The issue where tf.keras.optimizers.Adam() behaves differently from the string "adam" in model.compile() is caused by TensorFlow having two different Adam implementations. In TF 2.11+, tf.keras.optimizers.Adam points to the new experimental optimizer, while the string "adam" resolves to the legacy tf.keras.optimizers.legacy.Adam. The new optimizer has different default behavior (especially around learning rate schedules and variable handling) that can cause training to fail or produce poor results.

The Problem

python
1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
5    tf.keras.layers.Dense(1)
6])
7
8# This might not train properly
9model.compile(optimizer=tf.keras.optimizers.Adam(), loss='mse')
10
11# This works fine
12model.compile(optimizer='adam', loss='mse')

Both should be equivalent, but in TF 2.11-2.14, they resolve to different implementations.

Why It Happens

TensorFlow 2.11 introduced new optimizer classes under tf.keras.optimizers. The old optimizers were moved to tf.keras.optimizers.legacy:

python
1# TF 2.10 and earlier — same optimizer
2tf.keras.optimizers.Adam  # The only Adam
3
4# TF 2.11+ — two different Adamss
5tf.keras.optimizers.Adam           # NEW implementation
6tf.keras.optimizers.legacy.Adam    # OLD (proven) implementation
7
8# The string 'adam' maps to:
9# TF 2.11-2.14: legacy.Adam (for backward compatibility)
10# TF 2.15+: the new Adam (legacy removed)

Fix 1: Use the Legacy Optimizer

python
1# Explicitly use the legacy optimizer
2model.compile(
3    optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.001),
4    loss='mse'
5)

This gives you the same optimizer that the string "adam" resolves to.

Fix 2: Use the String

python
1# Let TensorFlow resolve to the correct implementation
2model.compile(optimizer='adam', loss='mse')
3
4# With custom learning rate via string
5model.compile(
6    optimizer=tf.keras.optimizers.deserialize({
7        'class_name': 'adam',
8        'config': {'learning_rate': 0.001}
9    }),
10    loss='mse'
11)

Fix 3: Set the Learning Rate Explicitly

The new Adam optimizer has different default learning rate scheduling. Setting the learning rate explicitly often fixes training issues:

python
1optimizer = tf.keras.optimizers.Adam(
2    learning_rate=0.001,
3    beta_1=0.9,
4    beta_2=0.999,
5    epsilon=1e-7
6)
7model.compile(optimizer=optimizer, loss='mse')

Fix 4: Use Environment Variable (TF 2.11-2.14)

python
1import os
2os.environ['TF_USE_LEGACY_KERAS'] = '1'
3
4import tensorflow as tf
5# Now tf.keras.optimizers.Adam is the legacy version

Or set the environment variable before running:

bash
TF_USE_LEGACY_KERAS=1 python train.py

Fix 5: Upgrade to TF 2.15+

In TF 2.15+, the new optimizers are mature and stable. The legacy namespace was removed. Upgrading resolves the inconsistency:

bash
pip install tensorflow>=2.15
python
1# TF 2.15+ — consistent behavior
2model.compile(optimizer=tf.keras.optimizers.Adam(), loss='mse')
3model.compile(optimizer='adam', loss='mse')
4# Both use the same (new) optimizer

Differences Between Old and New Adam

FeatureLegacy AdamNew Adam (TF 2.11+)
Weight decayNot built-inBuilt-in via weight_decay param
EMA (Exponential Moving Average)Not availableBuilt-in via use_ema param
Learning rate scheduleSeparate LearningRateScheduleIntegrated
Gradient clippingManual via clipnorm/clipvalueSame
Mixed precisionWorksImproved support
Save/LoadStableDifferent checkpoint format
python
1# New Adam features not available in legacy
2optimizer = tf.keras.optimizers.Adam(
3    learning_rate=0.001,
4    weight_decay=0.01,    # Built-in L2 regularization
5    use_ema=True,         # Exponential moving average of weights
6    ema_momentum=0.99
7)

Checking Which Optimizer You Have

python
1import tensorflow as tf
2
3optimizer = tf.keras.optimizers.Adam()
4print(type(optimizer))
5# TF 2.11+: <class 'keras.optimizers.adam.Adam'>
6# TF 2.10-: <class 'keras.optimizer_v2.adam.Adam'>
7
8# Check if legacy exists
9print(hasattr(tf.keras.optimizers, 'legacy'))
10# True in TF 2.11-2.14
11# False in TF 2.15+
12
13# What does the string resolve to?
14string_opt = tf.keras.optimizers.get('adam')
15print(type(string_opt))

Model Save/Load Compatibility

python
1# Models saved with legacy Adam may not load with new Adam
2# Save with legacy:
3model.compile(optimizer=tf.keras.optimizers.legacy.Adam(), loss='mse')
4model.save('my_model.keras')
5
6# Load — may get warnings about optimizer mismatch
7loaded = tf.keras.models.load_model('my_model.keras')
8# UserWarning: optimizer was saved with legacy Adam, loading with new Adam

To avoid issues, save with the same optimizer version you plan to load with.

Common Pitfalls

  • Assuming string and class are identical: In TF 2.11-2.14, "adam" and tf.keras.optimizers.Adam() are different classes. Always verify which implementation you are using with type(optimizer).
  • Mixing legacy and new optimizers in a project: Using legacy Adam for some models and new Adam for others causes confusion and checkpoint incompatibility. Pick one and be consistent.
  • Ignoring the TF_USE_LEGACY_KERAS variable: This environment variable must be set BEFORE importing TensorFlow. Setting it after import tf has no effect.
  • Weight decay confusion: The new Adam with weight_decay applies decoupled weight decay (AdamW), which is different from L2 regularization. Using both weight_decay in the optimizer and kernel_regularizer=l2() in layers applies regularization twice.
  • Custom training loops: The new optimizer has a different API for apply_gradients. Code that accesses internal optimizer state (like optimizer._decayed_lr) may break when switching from legacy to new.

Summary

  • TF 2.11+ has two Adam implementations: new (tf.keras.optimizers.Adam) and legacy (tf.keras.optimizers.legacy.Adam)
  • The string "adam" resolves to legacy in TF 2.11-2.14, causing a mismatch with the class
  • Use tf.keras.optimizers.legacy.Adam() or the string "adam" for backward-compatible behavior
  • Upgrade to TF 2.15+ where the inconsistency is resolved
  • The new Adam adds weight decay and EMA support but has a different checkpoint format

Course illustration
Course illustration

All Rights Reserved.