Perceptron
Backpropagation
Machine Learning
Neural Networks
Artificial Intelligence

Correct backpropagation in simple perceptron

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Backpropagation is a central mechanism for training neural networks, including simple perceptrons. A perceptron is the most basic unit of a neural network, which consists of a linear combination of input features followed by a non-linear activation function. Backpropagation helps in adjusting the weights of these inputs to minimize the error between the predicted and actual outputs.

Understanding Perceptron

A perceptron takes several inputs, passes them through a weighted summation, and finally applies an activation function to produce an output. Mathematically, it can be expressed as:

y=f(_i=1nw_ix_i+b)y = f\left(\sum\_{i=1}^{n} w\_i x\_i + b\right)

Where:

yy is the output. • ff is the activation function (e.g., sigmoid, hyperbolic tangent). • wiw_i are the weights. • xix_i are the inputs. • bb is the bias term.

Error Calculation

The error of the perceptron is determined using a loss function, commonly the mean squared error (MSE) for regression tasks or cross-entropy loss for classification tasks. For MSE, it can be expressed as:

MSE=1n_i=1n(y_iy_i^)2\text{MSE} = \frac{1}{n} \sum\_{i=1}^{n} (y\_i - \hat{y\_i})^2

Where:

yiy_i is the actual output for ithi^{th} data point. • yi^\hat{y_i} is the predicted output.

Backpropagation Algorithm

Backpropagation is an optimization technique for neural networks, and it involves two major steps:

1. Forward Pass

In the forward pass, inputs are fed through the network, and outputs are calculated. The output is compared to the actual output using a defined error metric (like MSE).

2. Backward Pass

In the backward pass, the error propagated backwards through the network to update the input weights. The system uses the following steps:

Step 1: Compute the gradient

Calculate the gradient of the loss function concerning each weight, using the chain rule of calculus. This gradient indicates how a slight change in each weight affects the loss. If we consider a single perceptron with weights w\mathbf{w} and loss LL, the gradient for each weight wjw_j is computed as:

Lw_j=Ly^y^ssw_j\frac{\partial L}{\partial w\_j} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial s} \cdot \frac{\partial s}{\partial w\_j}

Where:

s=i=1nwixi+bs = \sum_{i=1}^{n} w_i x_i + by^=f(s)\hat{y} = f(s)

Step 2: Update Weights

Once the gradient is computed, update the weights using gradient descent. The weight update rule is expressed as:

w_j=w_jηLw_jw\_j = w\_j - \eta \cdot \frac{\partial L}{\partial w\_j}

Where η\eta is the learning rate, a hyperparameter that controls the step size during optimization.

Examples of Backpropagation

Consider a simple perceptron with inputs x1=0.5x_1 = 0.5 and x2=0.3x_2 = 0.3, weights w1=0.8w_1 = 0.8 and w2=0.4w_2 = 0.4, bias b=0.1b = 0.1, and a sigmoid activation function. The predicted output y^\hat{y} and actual output yy are:

• Predicted: y^=σ(0.8×0.5+0.4×0.3+0.1)=σ(0.74)\hat{y} = \sigma(0.8 \times 0.5 + 0.4 \times 0.3 + 0.1) = \sigma(0.74) • Actual: y=1y = 1

Forward Pass

Calculate the output:

y^=11+e0.740.676\hat{y} = \frac{1}{1 + e^{-0.74}} \approx 0.676

Calculate the error using MSE:

L=12(10.676)20.052L = \frac{1}{2}(1 - 0.676)^2 \approx 0.052

Backward Pass

Calculate the error gradient for w1w_1:

Lw_1=(10.676)0.676(10.676)0.50.0398\frac{\partial L}{\partial w\_1} = - (1 - 0.676) \cdot 0.676 \cdot (1 - 0.676) \cdot 0.5 \approx 0.0398

Update weights:

w_1=0.80.01×0.03980.7996w\_1 = 0.8 - 0.01 \times 0.0398 \approx 0.7996

Key Considerations in Backpropagation

Learning Rate: A crucial hyperparameter in training neural networks. A small learning rate results in slow convergence, while a large learning rate can lead to divergence. • Activation Functions: The choice of activation function affects the performance of a neural network. Sigmoid functions were popular in early days but suffer from the vanishing gradient problem. ReLU and its variants are preferred in deeper networks. • Weight Initialization: Poor weight initialization can lead to slow convergence or getting stuck in local minima.

Summary Table

Steps in BackpropagationDescription
Forward PassCompute predicted output by passing inputs forward through network.
Compute LossCalculate loss between actual and predicted output using a cost function.
Backward PassCalculate gradients by propagating loss backwards through network.
Update WeightsAdjust weights using gradient descent.

Understanding and implementing correct backpropagation in a simple perceptron lays the foundation for constructing and training more complex neural networks effectively. As you gain familiarity with the basic concepts and nuances of perceptrons, transitioning to multi-layer networks will be considerably more manageable.


Course illustration
Course illustration

All Rights Reserved.