Keras
trainable parameter
machine learning
deep learning
neural networks

How can I make a trainable parameter in keras?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Creating Trainable Parameters in Keras

Trainable parameters are the core of neural network learning. They are the weights and biases that get updated during backpropagation to minimize the loss function. While Keras provides built-in layers like Dense and Conv2D that manage their own trainable parameters, there are situations where you need to define custom trainable variables. This is common when building custom layers, implementing attention mechanisms, or adding learnable scaling factors.

This article covers three practical approaches to creating trainable parameters in Keras with TensorFlow 2.x.

Approach 1: Custom Layer with self.add_weight

The most common and recommended way to create trainable parameters is by subclassing tf.keras.layers.Layer and using the add_weight method. This integrates cleanly with Keras model serialization, summary printing, and the training loop.

python
1import tensorflow as tf
2
3class LinearLayer(tf.keras.layers.Layer):
4    def __init__(self, units, **kwargs):
5        super().__init__(**kwargs)
6        self.units = units
7
8    def build(self, input_shape):
9        self.w = self.add_weight(
10            name="kernel",
11            shape=(input_shape[-1], self.units),
12            initializer="glorot_uniform",
13            trainable=True,
14        )
15        self.b = self.add_weight(
16            name="bias",
17            shape=(self.units,),
18            initializer="zeros",
19            trainable=True,
20        )
21        super().build(input_shape)
22
23    def call(self, inputs):
24        return tf.matmul(inputs, self.w) + self.b
25
26    def get_config(self):
27        config = super().get_config()
28        config.update({"units": self.units})
29        return config

Using this layer in a model:

python
1model = tf.keras.Sequential([
2    LinearLayer(64, input_shape=(784,)),
3    tf.keras.layers.ReLU(),
4    LinearLayer(10),
5    tf.keras.layers.Softmax(),
6])
7
8model.summary()
9model.compile(optimizer="adam", loss="categorical_crossentropy")

The build method is called automatically the first time the layer processes input. This deferred initialization means you do not need to specify the input dimension at construction time, which makes layers more flexible and reusable.

Approach 2: Learnable Scalar or Vector Parameters

Sometimes you need a single learnable scalar or a small vector that acts as a tunable coefficient. This is common in architectures that use learnable temperature scaling, attention score weighting, or feature gating.

python
1class LearnableScaleLayer(tf.keras.layers.Layer):
2    def __init__(self, **kwargs):
3        super().__init__(**kwargs)
4
5    def build(self, input_shape):
6        self.scale = self.add_weight(
7            name="scale",
8            shape=(1,),
9            initializer=tf.keras.initializers.Constant(1.0),
10            trainable=True,
11        )
12        self.shift = self.add_weight(
13            name="shift",
14            shape=(1,),
15            initializer="zeros",
16            trainable=True,
17        )
18        super().build(input_shape)
19
20    def call(self, inputs):
21        return inputs * self.scale + self.shift

This pattern is useful for adding a learnable normalization step or for implementing skip connections with a tunable blending factor:

python
1inputs = tf.keras.Input(shape=(128,))
2x = tf.keras.layers.Dense(128, activation="relu")(inputs)
3x = LearnableScaleLayer()(x)
4outputs = tf.keras.layers.Dense(10, activation="softmax")(x)
5
6model = tf.keras.Model(inputs, outputs)
7print("Trainable parameters:", model.count_params())

Approach 3: Using tf.Variable Directly

For quick experiments or when working outside of the Keras layer system, you can create trainable parameters with tf.Variable and include them manually:

python
1class AttentionWeightLayer(tf.keras.layers.Layer):
2    def __init__(self, attention_dim, **kwargs):
3        super().__init__(**kwargs)
4        self.attention_dim = attention_dim
5
6    def build(self, input_shape):
7        self.attention_weights = tf.Variable(
8            tf.random.normal([input_shape[-1], self.attention_dim]),
9            trainable=True,
10            name="attention_weights",
11        )
12        # Manually register so Keras tracks it
13        self._trainable_weights.append(self.attention_weights)
14        super().build(input_shape)
15
16    def call(self, inputs):
17        scores = tf.matmul(inputs, self.attention_weights)
18        attention = tf.nn.softmax(scores, axis=-1)
19        return inputs * attention

While this works, using add_weight is preferred because it handles registration, serialization, and device placement automatically. The tf.Variable approach is mainly useful when integrating non-Keras code or porting from raw TensorFlow.

Freezing and Unfreezing Parameters

You can control which parameters are trainable at runtime. This is essential for transfer learning, where you freeze pretrained layers and only train the new layers:

python
1# Freeze all layers in a pretrained base
2base_model = tf.keras.applications.MobileNetV2(
3    weights="imagenet", include_top=False
4)
5base_model.trainable = False
6
7# Add trainable head
8inputs = tf.keras.Input(shape=(224, 224, 3))
9x = base_model(inputs, training=False)
10x = tf.keras.layers.GlobalAveragePooling2D()(x)
11outputs = tf.keras.layers.Dense(5, activation="softmax")(x)
12
13model = tf.keras.Model(inputs, outputs)
14model.compile(optimizer="adam", loss="categorical_crossentropy")
15
16# Later, unfreeze the last 20 layers for fine-tuning
17for layer in base_model.layers[-20:]:
18    layer.trainable = True
19
20# Recompile with a lower learning rate
21model.compile(
22    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
23    loss="categorical_crossentropy",
24)

Inspecting Trainable Parameters

You can list all trainable parameters in a model to verify your custom layers are set up correctly:

python
for layer in model.layers:
    for weight in layer.trainable_weights:
        print(f"{layer.name}/{weight.name}: shape={weight.shape}")

Common Pitfalls

  • Forgetting to call super().build(). If you skip this in your build method, the layer may not be marked as built, causing it to be rebuilt on every forward pass.
  • Not implementing get_config. Without get_config, your custom layer cannot be serialized or loaded with model.save and tf.keras.models.load_model. Always return any constructor arguments in the config dict.
  • Wrong initializer choice. Using zeros for weight matrices (not biases) can prevent the network from breaking symmetry during training. Use glorot_uniform or he_normal for weight matrices, and zeros for biases.
  • Creating variables in init instead of build. Variables created in __init__ do not have access to the input shape, so you have to hardcode dimensions. Use build for shape-dependent parameters.
  • Forgetting to recompile after changing trainable flags. Changes to layer.trainable only take effect after calling model.compile again. Without recompiling, the optimizer state is stale and the frozen/unfrozen settings are not applied.

Summary

Keras provides several ways to create trainable parameters. The recommended approach is to subclass tf.keras.layers.Layer and use self.add_weight inside the build method, which gives you automatic shape inference, serialization support, and integration with the Keras training loop. For learnable scalars and vectors, the same pattern works with smaller shapes and constant initializers. Use tf.Variable directly only when you need raw TensorFlow compatibility. Always implement get_config for custom layers and choose appropriate initializers based on the role of each parameter.


Course illustration
Course illustration

All Rights Reserved.