How to create a neural network for regression?

Neural Networks

Machine Learning

Regression Analysis

Deep Learning

Data Science

How to create a neural network for regression?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A neural network for regression predicts a continuous number such as price, temperature, or demand. The high-level workflow looks similar to classification, but the output layer, loss function, and evaluation metrics are different. If you set those pieces correctly and prepare the data carefully, a small feed-forward network can become a strong baseline.

Start with the Data

Regression quality depends at least as much on data preparation as on architecture. The network needs numeric features, a numeric target, and a train-validation-test split that reflects how the model will be used.

Two preparation steps matter especially often:

scale the input features so one column does not dominate the gradients
keep the validation and test sets completely separate from training

The following example uses TensorFlow and synthetic data so it can run as-is:

python

1import numpy as np
2import tensorflow as tf
3
4rng = np.random.default_rng(42)
5
6# Three input features.
7X = rng.normal(size=(1000, 3)).astype("float32")
8
9# A continuous target with noise.
10y = (
11    3.5 * X[:, 0]
12    - 1.2 * X[:, 1]
13    + 0.8 * X[:, 2] ** 2
14    + rng.normal(scale=0.3, size=1000)
15).astype("float32")
16
17# Train, validation, test split.
18X_train, X_val, X_test = X[:700], X[700:850], X[850:]
19y_train, y_val, y_test = y[:700], y[700:850], y[850:]
20
21# Standardize using only training statistics.
22mean = X_train.mean(axis=0)
23std = X_train.std(axis=0)
24X_train = (X_train - mean) / std
25X_val = (X_val - mean) / std
26X_test = (X_test - mean) / std

Notice that the scaling parameters come from the training set only. Recomputing them on validation or test data would leak information and produce overly optimistic metrics.

Build a Regression Network

For tabular regression, start simple. One or two hidden layers with relu activations are often enough for a baseline. The output layer should usually have one unit and no activation, because you want the model to predict any real-valued number.

python

1model = tf.keras.Sequential([
2    tf.keras.layers.Input(shape=(3,)),
3    tf.keras.layers.Dense(32, activation="relu"),
4    tf.keras.layers.Dense(16, activation="relu"),
5    tf.keras.layers.Dense(1)
6])
7
8model.compile(
9    optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
10    loss="mse",
11    metrics=["mae"]
12)
13
14early_stopping = tf.keras.callbacks.EarlyStopping(
15    monitor="val_loss",
16    patience=20,
17    restore_best_weights=True
18)
19
20history = model.fit(
21    X_train,
22    y_train,
23    validation_data=(X_val, y_val),
24    epochs=300,
25    batch_size=32,
26    verbose=0,
27    callbacks=[early_stopping]
28)

There are a few important choices here:

'Dense(1) is the correct output for a single numeric target.'
'mse is a common training loss because large errors matter more.'
'mae is useful alongside mse because it is easier to interpret in target units.'
early stopping prevents you from training long after validation performance stops improving.

Evaluate the Model Properly

A low training loss is not the goal. The goal is a model that generalizes. After training, evaluate on the held-out test set and inspect a few predictions.

python

1test_loss, test_mae = model.evaluate(X_test, y_test, verbose=0)
2print(f"Test MSE: {test_loss:.4f}")
3print(f"Test MAE: {test_mae:.4f}")
4
5predictions = model.predict(X_test[:5], verbose=0).flatten()
6for actual, predicted in zip(y_test[:5], predictions):
7    print(f"actual={actual:.3f}, predicted={predicted:.3f}")

For real projects, go one step further and inspect residuals. If the model consistently underpredicts large targets or fails on certain feature ranges, the issue may be with data coverage rather than model depth.

You should also compare the neural network against simpler baselines such as linear regression or gradient boosted trees. On many structured datasets, those baselines are surprisingly competitive, and they give you a better sense of whether the neural network is earning its complexity.

When to Adjust the Architecture

If the model underfits, try one change at a time:

add a small number of hidden units
train a bit longer
improve features
tune the learning rate

If the model overfits, reduce capacity or add regularization. Typical tools are:

fewer layers or fewer units
'Dropout'
L2 weight decay
more training data

The wrong instinct is to stack many layers immediately. For ordinary regression, extra depth often makes training less stable without solving the real problem.

Common Pitfalls

Using a classification output. A softmax or sigmoid output is wrong for ordinary regression unless the target has a special bounded interpretation.
Skipping feature scaling. Unscaled tabular inputs often slow training and can make optimization noisy.
Measuring only training loss. Always watch validation and test metrics.
Leaking test information into preprocessing. Fit scalers, encoders, and imputers on training data only.
Ignoring the target distribution. Extremely skewed targets may benefit from a log transform or a different loss function.
Assuming a bigger network is automatically better. Data quality and feature design usually matter more than raw layer count.

Summary

Regression networks predict continuous values, not classes.
Use a single linear output unit for a single numeric target.
Scale features with training-set statistics only.
Train with a regression loss such as mse and monitor a human-readable metric such as mae.
Evaluate against a held-out test set and compare with simpler baselines before increasing model complexity.