LSTM
Keras
multivariate time series
multi-task learning
deep learning

Building a mutlivariate, multi-task LSTM with Keras

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

A multivariate multi-task LSTM takes a sequence with several features at each time step and produces multiple outputs for different prediction targets. In Keras, the clean way to build this is the Functional API: use a shared recurrent backbone to learn the sequence representation, then attach one output head per task.

What The Model Needs To Handle

There are two separate ideas in the title.

Multivariate means each time step has multiple input features, such as:

  • temperature
  • humidity
  • pressure
  • previous demand

Multi-task means the model predicts more than one target, such as:

  • next-hour demand
  • next-hour temperature class

The shared LSTM layers learn a common sequence encoding, and the task-specific heads learn what each target needs from that shared representation.

Input Shape For A Multivariate LSTM

Keras expects recurrent inputs in the shape:

batch x time_steps x features

For example, if each sample contains 24 time steps and 5 features per step, the model input shape is (24, 5).

Example Model In Keras

Here is a minimal TensorFlow/Keras example with one regression head and one classification head.

python
1import numpy as np
2import tensorflow as tf
3
4# Fake training data: 200 samples, 12 time steps, 4 features.
5X = np.random.rand(200, 12, 4).astype("float32")
6y_reg = np.random.rand(200, 1).astype("float32")
7y_cls = np.random.randint(0, 3, size=(200,))
8
9y_cls = tf.keras.utils.to_categorical(y_cls, num_classes=3)
10
11inputs = tf.keras.Input(shape=(12, 4), name="sequence")
12x = tf.keras.layers.LSTM(32, return_sequences=False)(inputs)
13x = tf.keras.layers.Dense(16, activation="relu")(x)
14
15regression_output = tf.keras.layers.Dense(1, name="regression_head")(x)
16classification_output = tf.keras.layers.Dense(3, activation="softmax", name="classification_head")(x)
17
18model = tf.keras.Model(
19    inputs=inputs,
20    outputs=[regression_output, classification_output]
21)
22
23model.compile(
24    optimizer="adam",
25    loss={
26        "regression_head": "mse",
27        "classification_head": "categorical_crossentropy",
28    },
29    metrics={
30        "regression_head": ["mae"],
31        "classification_head": ["accuracy"],
32    }
33)
34
35model.fit(
36    X,
37    {
38        "regression_head": y_reg,
39        "classification_head": y_cls,
40    },
41    epochs=3,
42    batch_size=16,
43    verbose=0,
44)
45
46print(model.predict(X[:2], verbose=0))

This example shows the core architecture: one shared LSTM trunk, two task-specific outputs.

Why Shared Layers Help

The main reason to use a multi-task model is inductive sharing. If the tasks are related, the backbone can learn temporal patterns that help both of them.

For example, a shared representation of recent sensor history may improve both:

  • numeric forecasting
  • anomaly category prediction

That can reduce overfitting compared with training completely separate models.

When To Use return_sequences=True

In the simple example above, the last LSTM output is enough because each task makes one prediction for the whole window.

If each time step needs its own prediction, or if you want to stack another recurrent layer, use return_sequences=True.

python
x = tf.keras.layers.LSTM(32, return_sequences=True)(inputs)
x = tf.keras.layers.LSTM(16)(x)

This passes the full sequence to the next recurrent layer before collapsing it to one final representation.

Loss Weighting Matters

Different tasks can have losses on very different numeric scales. If you do nothing, one task may dominate training.

Keras lets you weight the losses.

python
1model.compile(
2    optimizer="adam",
3    loss={
4        "regression_head": "mse",
5        "classification_head": "categorical_crossentropy",
6    },
7    loss_weights={
8        "regression_head": 0.5,
9        "classification_head": 1.0,
10    }
11)

This is often necessary in real multi-task systems.

Common Pitfalls

A common mistake is getting the input shape wrong. Keras LSTMs expect time_steps x features per sample, not the other way around.

Another issue is mismatching output heads and label dictionaries. The names used in Model(outputs=...), compile, and fit must line up cleanly.

Developers also often ignore loss scaling. If one task has much larger numeric loss values, the shared backbone may mostly optimize that task and neglect the others.

Finally, do not assume multi-task learning always helps. If the tasks are unrelated, forcing them to share an LSTM representation can hurt both.

Summary

  • A multivariate multi-task LSTM uses sequences with multiple features and predicts more than one target.
  • The Keras Functional API is the right tool because it supports shared backbones and multiple output heads.
  • Input shape should be batch x time_steps x features.
  • Use separate losses and metrics for each output head, and add loss weights when needed.
  • Multi-task learning helps most when the tasks share useful temporal structure.

Course illustration
Course illustration

All Rights Reserved.