TensorFlow
Optimizer
Machine Learning
Deep Learning
AI Development

How to create an optimizer in Tensorflow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In TensorFlow, "create an optimizer" usually means instantiating one of the optimizer classes that updates model weights during training. You choose the algorithm, configure its learning rate and related hyperparameters, and then either pass it to model.compile or use it directly inside a custom training loop.

Creating a Built-In Optimizer

The simplest example is Adam:

python
1import tensorflow as tf
2
3optimizer = tf.keras.optimizers.Adam(
4    learning_rate=0.001
5)
6
7print(type(optimizer).__name__)

This object now knows how to apply gradient updates, but it does nothing until you use it in training.

Using the Optimizer With model.compile

Most Keras workflows pass the optimizer to compile:

python
1import numpy as np
2import tensorflow as tf
3
4x = np.array([[0.0], [1.0], [2.0], [3.0]], dtype="float32")
5y = np.array([[0.0], [2.0], [4.0], [6.0]], dtype="float32")
6
7model = tf.keras.Sequential([
8    tf.keras.layers.Input(shape=(1,)),
9    tf.keras.layers.Dense(1)
10])
11
12optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
13
14model.compile(
15    optimizer=optimizer,
16    loss="mse"
17)
18
19model.fit(x, y, epochs=20, verbose=0)

This is the normal answer when you are training a model with the high-level Keras API.

Using the Optimizer in a Custom Training Loop

If you need more control, use the optimizer directly with GradientTape.

python
1import tensorflow as tf
2
3w = tf.Variable(0.0)
4optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
5
6for step in range(20):
7    with tf.GradientTape() as tape:
8        loss = (w - 5.0) ** 2
9
10    grads = tape.gradient(loss, [w])
11    optimizer.apply_gradients(zip(grads, [w]))
12
13print("w =", w.numpy())

This pattern is useful for research code, unusual update rules, or multi-model training flows where model.fit is too restrictive.

Choosing an Optimizer

Common starting points include:

  • 'SGD for simple baseline experiments'
  • 'Adam for general-purpose deep learning'
  • 'RMSprop for some recurrent or noisy-gradient setups'

Creating the optimizer is easy, but choosing it well still depends on the problem and on tuning the learning rate.

Learning Rate Schedules

An optimizer does not have to use a fixed learning rate. You can attach a schedule:

python
1schedule = tf.keras.optimizers.schedules.ExponentialDecay(
2    initial_learning_rate=0.01,
3    decay_steps=100,
4    decay_rate=0.96
5)
6
7optimizer = tf.keras.optimizers.Adam(learning_rate=schedule)

This is often more useful in practice than trying to invent a custom optimizer from scratch.

Optimizer State Matters

Most optimizers carry internal state in addition to the model weights. For example, Adam keeps moving averages of gradients and squared gradients. That means two optimizers with the same learning rate are not interchangeable once training has already progressed.

In practice, this matters when:

  • resuming training from checkpoints
  • changing optimizers mid-training
  • comparing experiments fairly

Creating the optimizer is easy, but its state becomes part of the training process almost immediately.

Common Pitfalls

The most common mistake is treating the optimizer as the entire training algorithm. The optimizer only applies updates; you still need a sensible model, loss, data pipeline, and training setup.

Another issue is picking an optimizer but leaving the learning rate at a poor value. Bad learning-rate choices often look like optimizer problems when they are really tuning problems.

A third pitfall is using apply_gradients with variables and gradients in the wrong order. The API expects pairs of (gradient, variable).

Finally, do not assume you need to subclass an optimizer just because the title says "create." In most projects, instantiating a built-in optimizer is the right level of customization.

Summary

  • In TensorFlow, creating an optimizer usually means instantiating a tf.keras.optimizers class.
  • Pass the optimizer to model.compile for normal Keras training.
  • Use GradientTape and apply_gradients for custom training loops.
  • Tune the learning rate carefully, since optimizer choice alone does not solve training problems.
  • Prefer built-in optimizers unless you truly need a custom update rule.

Course illustration
Course illustration

All Rights Reserved.