Run prediction from saved model in tensorflow 2.0

TensorFlow 2.0

model prediction

saved model

machine learning

deep learning

Run prediction from saved model in tensorflow 2.0

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Running prediction from a saved TensorFlow model in TensorFlow 2 usually comes down to loading the artifact correctly and matching the input contract used during training. Most inference bugs are caused by shape, dtype, or preprocessing mismatches rather than by the prediction call itself.

Load the Saved Model the Right Way

If the artifact was saved from a Keras model, the simplest path is tf.keras.models.load_model().

python

1import numpy as np
2import tensorflow as tf
3
4x_train = np.random.rand(256, 4).astype("float32")
5y_train = (x_train.sum(axis=1) > 2.0).astype("float32")
6
7model = tf.keras.Sequential([
8    tf.keras.layers.Input(shape=(4,)),
9    tf.keras.layers.Dense(16, activation="relu"),
10    tf.keras.layers.Dense(1, activation="sigmoid"),
11])
12
13model.compile(optimizer="adam", loss="binary_crossentropy")
14model.fit(x_train, y_train, epochs=3, verbose=0)
15
16model.save("artifacts/binary_model")
17
18loaded = tf.keras.models.load_model("artifacts/binary_model")

At that point, you can call predict() directly:

python

sample = np.array([[0.2, 0.1, 0.7, 0.4]], dtype="float32")
prediction = loaded.predict(sample, verbose=0)
print(prediction)

This is the normal answer when you own the model and it was saved through Keras.

Validate Shape and Dtype Before Predicting

Production inference code should not assume inputs are already well formed. Centralize basic checks before sending data into the model.

python

1import numpy as np
2
3print("expected input shape:", loaded.input_shape)
4print("expected dtype:", loaded.inputs[0].dtype)
5
6
7def prepare_features(raw):
8    arr = np.asarray(raw, dtype="float32")
9    if arr.ndim == 1:
10        arr = arr.reshape(1, -1)
11    if arr.shape[1:] != loaded.input_shape[1:]:
12        raise ValueError(f"expected {loaded.input_shape[1:]}, got {arr.shape[1:]}")
13    return arr
14
15
16features = prepare_features([0.3, 0.4, 0.2, 0.9])
17print(loaded.predict(features, verbose=0))

Those small checks catch many serving bugs immediately instead of letting them turn into obscure TensorFlow errors later.

Use SavedModel Signatures When the Export Contract Matters

If the model will be served or consumed by another tool, inspect the SavedModel signatures. They are often the real public interface.

python

1import tensorflow as tf
2
3artifact = tf.saved_model.load("artifacts/binary_model")
4print("available signatures:", list(artifact.signatures.keys()))
5
6serve_fn = artifact.signatures["serving_default"]
7result = serve_fn(tf.constant([[0.1, 0.2, 0.3, 0.4]], dtype=tf.float32))
8print(result)

This is useful when you need named outputs or you are integrating with a serving system that expects the exported signature rather than a Keras convenience method.

Wrap Loading and Preprocessing in One Predictor

In real applications, prediction code is easier to maintain when model loading, input validation, and postprocessing live in one place.

python

1class Predictor:
2    def __init__(self, model_path):
3        self.model = tf.keras.models.load_model(model_path)
4        self.expected_shape = self.model.input_shape[1:]
5
6    def predict_score(self, raw):
7        arr = np.asarray(raw, dtype="float32")
8        if arr.ndim == 1:
9            arr = arr.reshape(1, -1)
10        if arr.shape[1:] != self.expected_shape:
11            raise ValueError(f"expected {self.expected_shape}, got {arr.shape[1:]}")
12        return float(self.model.predict(arr, verbose=0)[0][0])

This prevents every API route or notebook cell from reinventing the same inference rules slightly differently.

Common Pitfalls

The biggest mistake is forgetting that the saved model expects the same preprocessing used during training. Feature order, normalization, categorical encoding, and missing-value handling all need to match.

Another common issue is shape confusion. A single example still usually needs a batch dimension, which is why a vector often has to be reshaped into one row before prediction.

People also mix up Keras loading and low-level SavedModel loading. load_model() is the simplest path for Keras models, while tf.saved_model.load() is more appropriate when you care about exported signatures and lower-level serving behavior.

Finally, do not reload the model for every request in an API. Load it once, validate inputs, and reuse the loaded object.

Summary

Use tf.keras.models.load_model() for ordinary Keras SavedModel inference.
Match input shape, dtype, and preprocessing to the training-time contract.
Add a batch dimension when predicting on a single example.
Use tf.saved_model.load() when you need to inspect or call exported signatures.
Most prediction failures come from bad inputs, not from the prediction API itself.