Building a mutlivariate, multi-task LSTM with Keras

LSTM

Keras

Multivariate

Multi-task

Machine Learning

Building a mutlivariate, multi-task LSTM with Keras

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A multivariate, multi-task LSTM takes several input features across time and predicts more than one target from the same sequence. The key design idea is to share one recurrent encoder across tasks, then branch into separate output heads so each target can learn its own final mapping.

Define the Problem Shape Clearly

There are two separate ideas in the title:

multivariate: each time step has multiple features
multi-task: the model predicts multiple outputs

For example, each sample might contain 24 hourly time steps with 5 features per step, and the model may predict:

next-day sales as a regression output
demand class as a classification output

That is a multi-task sequence model because one encoder supports two different prediction objectives.

Prepare Inputs as 3D Tensors

Keras LSTMs expect input in this shape:

text

(batch_size, timesteps, features)

A small synthetic example:

python

1import numpy as np
2
3num_samples = 500
4num_steps = 24
5num_features = 5
6
7X = np.random.rand(num_samples, num_steps, num_features).astype("float32")
8y_reg = np.random.rand(num_samples, 1).astype("float32")
9y_cls = np.random.randint(0, 3, size=(num_samples,))

Here:

'X is multivariate sequence input'
'y_reg is a regression target'
'y_cls is a 3-class classification target'

Use the Functional API for Multi-Task Models

A Sequential model is usually too limited for multi-head output. Use the Functional API instead.

python

1import tensorflow as tf
2from tensorflow import keras
3from tensorflow.keras import layers
4
5inputs = keras.Input(shape=(num_steps, num_features), name="sequence")
6x = layers.Masking()(inputs)
7x = layers.LSTM(64, return_sequences=False)(x)
8x = layers.Dense(32, activation="relu")(x)
9
10sales_output = layers.Dense(1, name="sales_output")(x)
11class_output = layers.Dense(3, activation="softmax", name="class_output")(x)
12
13model = keras.Model(
14    inputs=inputs,
15    outputs={
16        "sales_output": sales_output,
17        "class_output": class_output,
18    }
19)

The LSTM learns a shared temporal representation, and each task gets its own head.

Compile with Per-Task Losses

Different tasks often need different loss functions.

python

1model.compile(
2    optimizer="adam",
3    loss={
4        "sales_output": "mse",
5        "class_output": "sparse_categorical_crossentropy",
6    },
7    metrics={
8        "sales_output": ["mae"],
9        "class_output": ["accuracy"],
10    }
11)

This is one of the biggest advantages of Keras multi-output models: each output can be trained with the loss that matches its semantics.

Train with a Dictionary of Targets

The fit call should mirror the output names.

python

1history = model.fit(
2    X,
3    {
4        "sales_output": y_reg,
5        "class_output": y_cls,
6    },
7    epochs=5,
8    batch_size=32,
9    validation_split=0.2,
10)

The history object will include separate loss and metric traces for each task.

Loss Weighting Matters

One task can dominate training if its loss scale is much larger than the others. In that case, use loss_weights.

python

1model.compile(
2    optimizer="adam",
3    loss={
4        "sales_output": "mse",
5        "class_output": "sparse_categorical_crossentropy",
6    },
7    loss_weights={
8        "sales_output": 0.5,
9        "class_output": 1.0,
10    }
11)

This is often necessary in real multi-task systems where regression and classification losses have very different numeric ranges.

Add More LSTM Depth Only When Needed

It is tempting to stack many recurrent layers immediately. Start simple first.

A deeper version would look like:

python

1inputs = keras.Input(shape=(num_steps, num_features))
2x = layers.LSTM(64, return_sequences=True)(inputs)
3x = layers.LSTM(32)(x)
4x = layers.Dense(32, activation="relu")(x)

That can help on harder sequence problems, but it also increases training time and the risk of overfitting.

Use Shared Trunk, Task-Specific Heads

The main architectural principle is:

shared trunk for temporal representation
separate heads for task-specific output

If two tasks are strongly related, this often improves data efficiency. If the tasks are unrelated, forcing them into one model can hurt both.

So multi-task learning is not automatically better. It works best when tasks share useful structure.

Predict with Named Outputs

Prediction results come back in the same output structure.

python

predictions = model.predict(X[:2])
print(predictions["sales_output"].shape)
print(predictions["class_output"].shape)

This makes inference clean, especially when serving the model downstream.

Common Pitfalls

Using a Sequential model when the problem clearly needs multiple output heads.
Feeding LSTMs data that is not shaped as (batch, timesteps, features).
Ignoring task loss scaling and letting one target dominate the optimization.
Combining unrelated tasks and expecting multi-task learning to help automatically.
Forgetting to match output names with the target dictionary passed to fit().

Summary

Multivariate, multi-task LSTMs consume multiple features per time step and predict multiple targets.
Keras Functional API is the right tool because it supports shared encoders and multiple output heads.
Compile the model with per-task losses and metrics.
Use loss weighting when one task overwhelms the others.
Multi-task learning works best when the tasks genuinely share useful sequence structure.