Default Adam optimizer doesn't work in tf.keras but string adam does

tensorflow

keras

adam optimizer

machine learning

deep learning

Default Adam optimizer doesn't work in tf.keras but string adam does

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

The issue where tf.keras.optimizers.Adam() behaves differently from the string "adam" in model.compile() is caused by TensorFlow having two different Adam implementations. In TF 2.11+, tf.keras.optimizers.Adam points to the new experimental optimizer, while the string "adam" resolves to the legacy tf.keras.optimizers.legacy.Adam. The new optimizer has different default behavior (especially around learning rate schedules and variable handling) that can cause training to fail or produce poor results.

The Problem

python

1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
5    tf.keras.layers.Dense(1)
6])
7
8# This might not train properly
9model.compile(optimizer=tf.keras.optimizers.Adam(), loss='mse')
10
11# This works fine
12model.compile(optimizer='adam', loss='mse')

Both should be equivalent, but in TF 2.11-2.14, they resolve to different implementations.

Why It Happens

TensorFlow 2.11 introduced new optimizer classes under tf.keras.optimizers. The old optimizers were moved to tf.keras.optimizers.legacy:

python

1# TF 2.10 and earlier — same optimizer
2tf.keras.optimizers.Adam  # The only Adam
3
4# TF 2.11+ — two different Adamss
5tf.keras.optimizers.Adam           # NEW implementation
6tf.keras.optimizers.legacy.Adam    # OLD (proven) implementation
7
8# The string 'adam' maps to:
9# TF 2.11-2.14: legacy.Adam (for backward compatibility)
10# TF 2.15+: the new Adam (legacy removed)

Fix 1: Use the Legacy Optimizer

python

1# Explicitly use the legacy optimizer
2model.compile(
3    optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.001),
4    loss='mse'
5)

This gives you the same optimizer that the string "adam" resolves to.

Fix 2: Use the String

python

1# Let TensorFlow resolve to the correct implementation
2model.compile(optimizer='adam', loss='mse')
3
4# With custom learning rate via string
5model.compile(
6    optimizer=tf.keras.optimizers.deserialize({
7        'class_name': 'adam',
8        'config': {'learning_rate': 0.001}
9    }),
10    loss='mse'
11)

Fix 3: Set the Learning Rate Explicitly

The new Adam optimizer has different default learning rate scheduling. Setting the learning rate explicitly often fixes training issues:

python

1optimizer = tf.keras.optimizers.Adam(
2    learning_rate=0.001,
3    beta_1=0.9,
4    beta_2=0.999,
5    epsilon=1e-7
6)
7model.compile(optimizer=optimizer, loss='mse')

Fix 4: Use Environment Variable (TF 2.11-2.14)

python

1import os
2os.environ['TF_USE_LEGACY_KERAS'] = '1'
3
4import tensorflow as tf
5# Now tf.keras.optimizers.Adam is the legacy version

Or set the environment variable before running:

bash

TF_USE_LEGACY_KERAS=1 python train.py

Fix 5: Upgrade to TF 2.15+

In TF 2.15+, the new optimizers are mature and stable. The legacy namespace was removed. Upgrading resolves the inconsistency:

bash

pip install tensorflow>=2.15

python

1# TF 2.15+ — consistent behavior
2model.compile(optimizer=tf.keras.optimizers.Adam(), loss='mse')
3model.compile(optimizer='adam', loss='mse')
4# Both use the same (new) optimizer

Differences Between Old and New Adam

Feature	Legacy Adam	New Adam (TF 2.11+)
Weight decay	Not built-in	Built-in via `weight_decay` param
EMA (Exponential Moving Average)	Not available	Built-in via `use_ema` param
Learning rate schedule	Separate `LearningRateSchedule`	Integrated
Gradient clipping	Manual via `clipnorm`/`clipvalue`	Same
Mixed precision	Works	Improved support
Save/Load	Stable	Different checkpoint format

python

1# New Adam features not available in legacy
2optimizer = tf.keras.optimizers.Adam(
3    learning_rate=0.001,
4    weight_decay=0.01,    # Built-in L2 regularization
5    use_ema=True,         # Exponential moving average of weights
6    ema_momentum=0.99
7)

Checking Which Optimizer You Have

python

1import tensorflow as tf
2
3optimizer = tf.keras.optimizers.Adam()
4print(type(optimizer))
5# TF 2.11+: <class 'keras.optimizers.adam.Adam'>
6# TF 2.10-: <class 'keras.optimizer_v2.adam.Adam'>
7
8# Check if legacy exists
9print(hasattr(tf.keras.optimizers, 'legacy'))
10# True in TF 2.11-2.14
11# False in TF 2.15+
12
13# What does the string resolve to?
14string_opt = tf.keras.optimizers.get('adam')
15print(type(string_opt))

Model Save/Load Compatibility

python

1# Models saved with legacy Adam may not load with new Adam
2# Save with legacy:
3model.compile(optimizer=tf.keras.optimizers.legacy.Adam(), loss='mse')
4model.save('my_model.keras')
5
6# Load — may get warnings about optimizer mismatch
7loaded = tf.keras.models.load_model('my_model.keras')
8# UserWarning: optimizer was saved with legacy Adam, loading with new Adam

To avoid issues, save with the same optimizer version you plan to load with.

Common Pitfalls

Assuming string and class are identical: In TF 2.11-2.14, "adam" and tf.keras.optimizers.Adam() are different classes. Always verify which implementation you are using with type(optimizer).
Mixing legacy and new optimizers in a project: Using legacy Adam for some models and new Adam for others causes confusion and checkpoint incompatibility. Pick one and be consistent.
Ignoring the TF_USE_LEGACY_KERAS variable: This environment variable must be set BEFORE importing TensorFlow. Setting it after import tf has no effect.
Weight decay confusion: The new Adam with weight_decay applies decoupled weight decay (AdamW), which is different from L2 regularization. Using both weight_decay in the optimizer and kernel_regularizer=l2() in layers applies regularization twice.
Custom training loops: The new optimizer has a different API for apply_gradients. Code that accesses internal optimizer state (like optimizer._decayed_lr) may break when switching from legacy to new.

Summary

TF 2.11+ has two Adam implementations: new (tf.keras.optimizers.Adam) and legacy (tf.keras.optimizers.legacy.Adam)
The string "adam" resolves to legacy in TF 2.11-2.14, causing a mismatch with the class
Use tf.keras.optimizers.legacy.Adam() or the string "adam" for backward-compatible behavior
Upgrade to TF 2.15+ where the inconsistency is resolved
The new Adam adds weight decay and EMA support but has a different checkpoint format