Keras
Dense Layers
Neural Networks
Machine Learning
Model Architecture

How to decide the size of layers in Keras' Dense method?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Choosing the number of units in a Keras Dense layer is not a matter of memorizing one formula. The right size depends on the problem, the amount of data, the complexity of the patterns you expect the network to learn, and how much overfitting you can tolerate.

Start with the Constraints of the Problem

Some layer sizes are dictated by the task. The input shape comes from your features, and the output size depends on what you are predicting. Hidden layers are the part you tune.

For a binary classification problem with 20 input features, the final layer usually has one unit with a sigmoid activation:

python
1import tensorflow as tf
2from tensorflow import keras
3
4model = keras.Sequential(
5    [
6        keras.layers.Input(shape=(20,)),
7        keras.layers.Dense(64, activation="relu"),
8        keras.layers.Dense(32, activation="relu"),
9        keras.layers.Dense(1, activation="sigmoid"),
10    ]
11)
12
13model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
14model.summary()

The 64 and 32 values are design choices. They are large enough to learn moderate nonlinearity, but not so large that the model becomes unreasonably expensive for ordinary tabular data.

Use Model Capacity as a Tuning Lever

A Dense layer with more units has more parameters. More parameters increase representational power, but they also increase the chance of overfitting and slow training.

For tabular data, a good first attempt is often one or two hidden layers in the 32 to 256 range. If the model underfits, increase capacity. If training accuracy is much higher than validation accuracy, reduce capacity or add regularization.

This small experiment compares a compact network with a larger one:

python
1def build_model(units):
2    model = keras.Sequential(
3        [
4            keras.layers.Input(shape=(20,)),
5            keras.layers.Dense(units, activation="relu"),
6            keras.layers.Dense(units // 2, activation="relu"),
7            keras.layers.Dense(1, activation="sigmoid"),
8        ]
9    )
10    model.compile(
11        optimizer="adam",
12        loss="binary_crossentropy",
13        metrics=["accuracy"],
14    )
15    return model
16
17small_model = build_model(32)
18large_model = build_model(128)

When both models are trained on the same dataset, compare validation loss, not just training accuracy. The larger model is not automatically better.

Match Architecture to Data Type

Dense layers are common for tabular data and for the final classifier head of larger models. They are usually not the first thing you tune for images or text, where convolutional or sequence layers often do most of the representational work.

A practical rule set is:

  • For small structured datasets, keep hidden layers modest.
  • For high-dimensional embeddings or learned feature vectors, larger dense layers can make sense.
  • For very small datasets, fewer units are often better because the model has less room to memorize noise.

If you already have strong engineered features, a shallow network may beat a deeper one because the problem is simpler than it first appears.

Regularize Before Making the Network Huge

If a network starts overfitting, adding more units is usually the wrong direction. First try regularization, dropout, or early stopping.

python
1regularized_model = keras.Sequential(
2    [
3        keras.layers.Input(shape=(20,)),
4        keras.layers.Dense(64, activation="relu"),
5        keras.layers.Dropout(0.3),
6        keras.layers.Dense(32, activation="relu"),
7        keras.layers.Dense(1, activation="sigmoid"),
8    ]
9)
10
11callback = keras.callbacks.EarlyStopping(
12    monitor="val_loss",
13    patience=5,
14    restore_best_weights=True,
15)

This gives you a more stable baseline for deciding whether the layer sizes are actually too small or whether the problem is excessive variance.

Common Pitfalls

A common mistake is choosing hidden layer sizes based on superstition, such as always halving from one layer to the next. That pattern can work, but it is a heuristic, not a rule. Let validation performance decide.

Another mistake is making every model large by default. If your dataset has only a few thousand rows, a stack of very wide dense layers can memorize the training set and produce impressive but misleading metrics.

It is also easy to confuse input dimension with hidden layer size. A dataset with 500 features does not mean your first dense layer must have 500 or more units. Sometimes a smaller layer helps the network learn a useful compressed representation.

Finally, tune one thing at a time. If you change optimizer, learning rate, batch size, layer count, and units together, you will not know which choice actually improved the model.

Summary

  • Input and output sizes come from the problem; hidden layer sizes are tuned.
  • Start with small or medium dense layers and scale up only if validation results justify it.
  • Wider layers add capacity but also add parameters and overfitting risk.
  • Choose sizes based on data type, dataset size, and validation behavior.
  • Use regularization and early stopping before assuming you need a much larger network.

Course illustration
Course illustration

All Rights Reserved.