Approximating the sine function with a neural network
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Approximating sin(x) with a neural network is a standard toy problem because the target function is smooth, bounded, and easy to sample. It is useful for understanding how network size, activation choice, training data range, and extrapolation affect a regression model.
Why this problem is a good learning example
A feedforward network can approximate many smooth functions, and sin(x) gives you a target whose shape is familiar enough to inspect visually. If the model is doing something wrong, you can usually see it immediately in the prediction curve.
This task also exposes an important limitation: fitting a function well inside the training interval is not the same as learning its true mathematical rule everywhere.
A small Keras model
For sine approximation, a compact multilayer perceptron is enough. tanh is a natural activation here because it is smooth and symmetric around zero.
This is enough to learn a reasonable approximation on the interval from -pi to pi.
Data range matters more than people expect
If you only train on one small interval, the network learns behavior for that interval, not periodicity as a principle. It may fit sin(x) well near the samples and then drift badly outside the training range.
For example, if you train only on [-pi, pi] and test at 4 * pi, a plain multilayer perceptron may produce a value that is nowhere near zero. That is not a bug; it is a reminder that neural networks interpolate from the data they see.
If periodic behavior outside the training range matters, you have a few options:
- train on several periods instead of one
- use input features such as
sin(x)andcos(x)if the goal is engineering rather than pedagogy - choose architectures that better represent periodic structure
Activation choice affects fit quality
You can solve this with ReLU layers, but smooth activations often make the regression problem easier. tanh is a common default because the target is smooth and oscillatory. ReLU networks can still fit the curve, but they approximate it piecewise and may need more capacity or more training.
The main lesson is not that one activation is universally best, but that the network should match the structure of the target problem. Smooth target functions usually reward smooth hidden activations.
Visual evaluation is useful here
A regression loss alone does not always tell you whether the shape is right. Plot the predicted curve against the real sine wave.
If the network underfits, the curve looks flattened or phase-shifted. If training is unstable, you may see jagged or noisy predictions.
What success looks like
A good approximation on the training domain means:
- low mean squared error
- a predicted curve visually close to
sin(x) - stable behavior on nearby unseen points inside the same interval
It does not necessarily mean the model discovered trigonometry. In most cases it just learned a flexible approximation over the region where you gave it examples.
That distinction matters because beginners often mistake interpolation performance for genuine symbolic understanding.
Common Pitfalls
A common mistake is expecting good extrapolation outside the training interval. A basic dense network usually will not preserve periodicity automatically.
Another mistake is training on too few points or too few epochs and concluding that neural networks cannot model smooth functions. This is usually a training setup issue, not a fundamental limitation.
A third mistake is ignoring input scaling. Extremely large input ranges can make optimization harder and can waste model capacity on representing scale rather than shape.
Summary
- '
sin(x)is a useful regression toy problem for understanding neural-network approximation.' - A small dense network with
tanhactivations is usually enough on a bounded interval. - The training range strongly determines what the model learns.
- Plot predictions against the true curve instead of relying only on a scalar loss.
- Good interpolation does not imply the model learned periodicity outside the data it saw.

