How to do weight initialization by xavier rule in Tensorflow 2.0?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In TensorFlow 2, Xavier initialization is usually called Glorot initialization. You do not need to implement the formula manually in most cases. TensorFlow already provides GlorotUniform and GlorotNormal, which are the standard Xavier-style initializers for dense neural network layers.
Xavier and Glorot Mean the Same Family
The term "Xavier initialization" comes from the original paper, while TensorFlow and Keras typically use the name "Glorot." The idea is to keep the variance of activations and gradients in a reasonable range as data moves through the network.
In TensorFlow 2, the common choices are:
- '
tf.keras.initializers.GlorotUniform()' - '
tf.keras.initializers.GlorotNormal()'
For most feed-forward layers, those are the direct answer.
Use It in a Keras Layer
Here is the normal TensorFlow 2 style:
This is all you need in most cases. Keras handles weight creation and applies the initializer to the layer kernel.
If you want reproducible initialization during experiments, pass a seed:
That can make debugging and comparison runs much easier.
It is especially helpful when comparing optimizer changes and wanting the starting weights to stay controlled across runs.
GlorotUniform Versus GlorotNormal
The two main Xavier-style choices differ only in the sampling distribution:
- '
GlorotUniformsamples from a bounded uniform distribution' - '
GlorotNormalsamples from a zero-mean normal distribution'
Both use fan-in and fan-out information from the layer shape. In practice, GlorotUniform is a common default.
Example with the normal variant:
If you do not have a strong reason to prefer one, GlorotUniform is a safe and standard choice.
Manual Initialization Is Usually Unnecessary
You can compute Xavier-style limits yourself, but that is rarely helpful unless you are building a custom layer from raw variables:
This shows the idea, but the built-in initializers are better because they infer fan values correctly for many layer shapes and keep your code shorter.
Pick an Initializer That Matches the Network
Glorot initialization is a strong default for many dense layers and tanh-like activations. But initializer choice is not one-size-fits-all. For example, He initialization is often preferred with ReLU-heavy networks because it is designed around the variance behavior of that activation family.
So the practical advice is:
- use Glorot or Xavier when you specifically want Xavier-style behavior
- do not force it onto every architecture without thinking about the activation pattern
That distinction matters more than memorizing the name.
Common Pitfalls
- Looking for a separate "Xavier" API when TensorFlow exposes it as Glorot initialization.
- Re-implementing the formula manually when a built-in initializer already exists.
- Assuming initializer choice does not matter for training stability.
- Using a custom random initializer without considering fan-in and fan-out scaling.
- Treating Xavier as the universal best initializer for every activation function.
Summary
- In TensorFlow 2, Xavier initialization is typically
GlorotUniformorGlorotNormal. - Use
kernel_initializer=tf.keras.initializers.GlorotUniform()in Keras layers for the common case. - Manual implementation is possible but usually unnecessary.
- Built-in Glorot initializers already account for layer shape.
- Choose the initializer with some awareness of the model architecture and activation functions.

