Building a mutlivariate, multi-task LSTM with Keras
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A multivariate multi-task LSTM takes a sequence with several features at each time step and produces multiple outputs for different prediction targets. In Keras, the clean way to build this is the Functional API: use a shared recurrent backbone to learn the sequence representation, then attach one output head per task.
What The Model Needs To Handle
There are two separate ideas in the title.
Multivariate means each time step has multiple input features, such as:
- temperature
- humidity
- pressure
- previous demand
Multi-task means the model predicts more than one target, such as:
- next-hour demand
- next-hour temperature class
The shared LSTM layers learn a common sequence encoding, and the task-specific heads learn what each target needs from that shared representation.
Input Shape For A Multivariate LSTM
Keras expects recurrent inputs in the shape:
batch x time_steps x features
For example, if each sample contains 24 time steps and 5 features per step, the model input shape is (24, 5).
Example Model In Keras
Here is a minimal TensorFlow/Keras example with one regression head and one classification head.
This example shows the core architecture: one shared LSTM trunk, two task-specific outputs.
Why Shared Layers Help
The main reason to use a multi-task model is inductive sharing. If the tasks are related, the backbone can learn temporal patterns that help both of them.
For example, a shared representation of recent sensor history may improve both:
- numeric forecasting
- anomaly category prediction
That can reduce overfitting compared with training completely separate models.
When To Use return_sequences=True
In the simple example above, the last LSTM output is enough because each task makes one prediction for the whole window.
If each time step needs its own prediction, or if you want to stack another recurrent layer, use return_sequences=True.
This passes the full sequence to the next recurrent layer before collapsing it to one final representation.
Loss Weighting Matters
Different tasks can have losses on very different numeric scales. If you do nothing, one task may dominate training.
Keras lets you weight the losses.
This is often necessary in real multi-task systems.
Common Pitfalls
A common mistake is getting the input shape wrong. Keras LSTMs expect time_steps x features per sample, not the other way around.
Another issue is mismatching output heads and label dictionaries. The names used in Model(outputs=...), compile, and fit must line up cleanly.
Developers also often ignore loss scaling. If one task has much larger numeric loss values, the shared backbone may mostly optimize that task and neglect the others.
Finally, do not assume multi-task learning always helps. If the tasks are unrelated, forcing them to share an LSTM representation can hurt both.
Summary
- A multivariate multi-task LSTM uses sequences with multiple features and predicts more than one target.
- The Keras Functional API is the right tool because it supports shared backbones and multiple output heads.
- Input shape should be
batch x time_steps x features. - Use separate losses and metrics for each output head, and add loss weights when needed.
- Multi-task learning helps most when the tasks share useful temporal structure.

