Building a mutlivariate, multi-task LSTM with Keras
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A multivariate, multi-task LSTM takes several input features across time and predicts more than one target from the same sequence. The key design idea is to share one recurrent encoder across tasks, then branch into separate output heads so each target can learn its own final mapping.
Define the Problem Shape Clearly
There are two separate ideas in the title:
- multivariate: each time step has multiple features
- multi-task: the model predicts multiple outputs
For example, each sample might contain 24 hourly time steps with 5 features per step, and the model may predict:
- next-day sales as a regression output
- demand class as a classification output
That is a multi-task sequence model because one encoder supports two different prediction objectives.
Prepare Inputs as 3D Tensors
Keras LSTMs expect input in this shape:
A small synthetic example:
Here:
- '
Xis multivariate sequence input' - '
y_regis a regression target' - '
y_clsis a 3-class classification target'
Use the Functional API for Multi-Task Models
A Sequential model is usually too limited for multi-head output. Use the Functional API instead.
The LSTM learns a shared temporal representation, and each task gets its own head.
Compile with Per-Task Losses
Different tasks often need different loss functions.
This is one of the biggest advantages of Keras multi-output models: each output can be trained with the loss that matches its semantics.
Train with a Dictionary of Targets
The fit call should mirror the output names.
The history object will include separate loss and metric traces for each task.
Loss Weighting Matters
One task can dominate training if its loss scale is much larger than the others. In that case, use loss_weights.
This is often necessary in real multi-task systems where regression and classification losses have very different numeric ranges.
Add More LSTM Depth Only When Needed
It is tempting to stack many recurrent layers immediately. Start simple first.
A deeper version would look like:
That can help on harder sequence problems, but it also increases training time and the risk of overfitting.
Use Shared Trunk, Task-Specific Heads
The main architectural principle is:
- shared trunk for temporal representation
- separate heads for task-specific output
If two tasks are strongly related, this often improves data efficiency. If the tasks are unrelated, forcing them into one model can hurt both.
So multi-task learning is not automatically better. It works best when tasks share useful structure.
Predict with Named Outputs
Prediction results come back in the same output structure.
This makes inference clean, especially when serving the model downstream.
Common Pitfalls
- Using a
Sequentialmodel when the problem clearly needs multiple output heads. - Feeding LSTMs data that is not shaped as
(batch, timesteps, features). - Ignoring task loss scaling and letting one target dominate the optimization.
- Combining unrelated tasks and expecting multi-task learning to help automatically.
- Forgetting to match output names with the target dictionary passed to
fit().
Summary
- Multivariate, multi-task LSTMs consume multiple features per time step and predict multiple targets.
- Keras Functional API is the right tool because it supports shared encoders and multiple output heads.
- Compile the model with per-task losses and metrics.
- Use loss weighting when one task overwhelms the others.
- Multi-task learning works best when the tasks genuinely share useful sequence structure.

