tensorflow
RNN
time-series
machine-learning
tutorial

How can I feed last output yt-1 as input for generating yt in tensorflow RNN?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Feeding y(t-1) into the model to generate y(t) is an autoregressive sequence pattern. In TensorFlow, the clean design depends on whether you are training with true previous targets, which is called teacher forcing, or generating future values where the model must feed back its own last prediction. Those two phases look similar conceptually but behave differently in code.

Teacher Forcing During Training

When the true previous output is available in the training data, the usual approach is to shift the target sequence and feed that shifted sequence as an input feature.

python
1import tensorflow as tf
2
3x = tf.random.normal((4, 5, 3))        # other input features
4true_y = tf.random.normal((4, 5, 1))   # target sequence
5
6prev_y = tf.concat([
7    tf.zeros((4, 1, 1)),
8    true_y[:, :-1, :]
9], axis=1)
10
11model_input = tf.concat([x, prev_y], axis=-1)
12print(model_input.shape)

Now each time step sees the previous true output value. This is often the simplest way to train an RNN that conditions on prior outputs.

Build a Keras RNN on the Augmented Input

Once the shifted previous output is part of the feature tensor, the model itself can be an ordinary sequence model.

python
1import tensorflow as tf
2
3inputs = tf.keras.Input(shape=(None, 4))
4x = tf.keras.layers.SimpleRNN(16, return_sequences=True)(inputs)
5outputs = tf.keras.layers.Dense(1)(x)
6
7model = tf.keras.Model(inputs, outputs)
8model.compile(optimizer='adam', loss='mse')

If your original feature count was 3 and you appended one previous-output feature, the input width becomes 4.

Autoregressive Generation at Inference Time

At inference time, the true future y(t-1) is not known. You must feed the model's own previous prediction back into the next step.

python
1import tensorflow as tf
2
3feature_steps = tf.random.normal((1, 5, 3))
4prev_y = tf.zeros((1, 1, 1))
5predictions = []
6
7for t in range(feature_steps.shape[1]):
8    current_x = feature_steps[:, t:t+1, :]
9    step_input = tf.concat([current_x, prev_y], axis=-1)
10    pred = model(step_input)
11    predictions.append(pred)
12    prev_y = pred
13
14pred_sequence = tf.concat(predictions, axis=1)
15print(pred_sequence.shape)

This is the core autoregressive loop: predict one step, feed that prediction back, then predict the next.

Why Training and Inference Differ

Teacher forcing makes training easier because the model always sees the correct previous output. Autoregressive inference is harder because prediction errors can accumulate over time.

That mismatch is normal in sequence modeling. The important design choice is to be explicit about it. Do not assume the exact same input pipeline serves both phases without adjustment.

Alternative: Put the Recurrence into the Cell State

Sometimes the better answer is not to feed y(t-1) as an explicit input feature at all. Many sequence problems can be modeled by letting the RNN hidden state carry the needed information. Explicit previous-output feedback is most useful when the actual last output value is part of the problem formulation, such as decoder-style generation or some time-series forecasting setups.

Common Pitfalls

  • Mixing teacher forcing and inference logic as if they were identical.
  • Forgetting to shift the target sequence when building y(t-1) for training.
  • Feeding the current target into the same step by mistake, which leaks future information.
  • Assuming the hidden state alone always replaces explicit autoregressive inputs.
  • Ignoring error accumulation during generation when the model feeds back its own predictions.

Summary

  • Feeding y(t-1) into the model is an autoregressive sequence pattern.
  • During training, use teacher forcing by shifting the true target sequence.
  • During inference, feed back the model's own last prediction.
  • TensorFlow supports both approaches cleanly with ordinary Keras sequence layers and explicit preprocessing.
  • Be explicit about the difference between training inputs and generation-time inputs.

Course illustration
Course illustration

All Rights Reserved.