TensorFlow
weight initialization
Xavier initialization
neural networks
deep learning

How to do weight initialization by xavier rule in Tensorflow 2.0?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In TensorFlow 2, Xavier initialization is usually called Glorot initialization. You do not need to implement the formula manually in most cases. TensorFlow already provides GlorotUniform and GlorotNormal, which are the standard Xavier-style initializers for dense neural network layers.

Xavier and Glorot Mean the Same Family

The term "Xavier initialization" comes from the original paper, while TensorFlow and Keras typically use the name "Glorot." The idea is to keep the variance of activations and gradients in a reasonable range as data moves through the network.

In TensorFlow 2, the common choices are:

  • 'tf.keras.initializers.GlorotUniform()'
  • 'tf.keras.initializers.GlorotNormal()'

For most feed-forward layers, those are the direct answer.

Use It in a Keras Layer

Here is the normal TensorFlow 2 style:

python
1import tensorflow as tf
2
3model = tf.keras.Sequential(
4    [
5        tf.keras.layers.Dense(
6            64,
7            activation="relu",
8            kernel_initializer=tf.keras.initializers.GlorotUniform(),
9            input_shape=(20,),
10        ),
11        tf.keras.layers.Dense(
12            10,
13            activation="softmax",
14            kernel_initializer=tf.keras.initializers.GlorotUniform(),
15        ),
16    ]
17)
18
19model.summary()

This is all you need in most cases. Keras handles weight creation and applies the initializer to the layer kernel.

If you want reproducible initialization during experiments, pass a seed:

python
initializer = tf.keras.initializers.GlorotUniform(seed=42)

That can make debugging and comparison runs much easier.

It is especially helpful when comparing optimizer changes and wanting the starting weights to stay controlled across runs.

GlorotUniform Versus GlorotNormal

The two main Xavier-style choices differ only in the sampling distribution:

  • 'GlorotUniform samples from a bounded uniform distribution'
  • 'GlorotNormal samples from a zero-mean normal distribution'

Both use fan-in and fan-out information from the layer shape. In practice, GlorotUniform is a common default.

Example with the normal variant:

python
1initializer = tf.keras.initializers.GlorotNormal()
2
3layer = tf.keras.layers.Dense(
4    32,
5    kernel_initializer=initializer,
6)

If you do not have a strong reason to prefer one, GlorotUniform is a safe and standard choice.

Manual Initialization Is Usually Unnecessary

You can compute Xavier-style limits yourself, but that is rarely helpful unless you are building a custom layer from raw variables:

python
1import tensorflow as tf
2import math
3
4fan_in = 20
5fan_out = 64
6limit = math.sqrt(6.0 / (fan_in + fan_out))
7
8weights = tf.Variable(
9    tf.random.uniform((fan_in, fan_out), minval=-limit, maxval=limit)
10)

This shows the idea, but the built-in initializers are better because they infer fan values correctly for many layer shapes and keep your code shorter.

Pick an Initializer That Matches the Network

Glorot initialization is a strong default for many dense layers and tanh-like activations. But initializer choice is not one-size-fits-all. For example, He initialization is often preferred with ReLU-heavy networks because it is designed around the variance behavior of that activation family.

So the practical advice is:

  • use Glorot or Xavier when you specifically want Xavier-style behavior
  • do not force it onto every architecture without thinking about the activation pattern

That distinction matters more than memorizing the name.

Common Pitfalls

  • Looking for a separate "Xavier" API when TensorFlow exposes it as Glorot initialization.
  • Re-implementing the formula manually when a built-in initializer already exists.
  • Assuming initializer choice does not matter for training stability.
  • Using a custom random initializer without considering fan-in and fan-out scaling.
  • Treating Xavier as the universal best initializer for every activation function.

Summary

  • In TensorFlow 2, Xavier initialization is typically GlorotUniform or GlorotNormal.
  • Use kernel_initializer=tf.keras.initializers.GlorotUniform() in Keras layers for the common case.
  • Manual implementation is possible but usually unnecessary.
  • Built-in Glorot initializers already account for layer shape.
  • Choose the initializer with some awareness of the model architecture and activation functions.

Course illustration
Course illustration

All Rights Reserved.