Correct backpropagation in simple perceptron

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Backpropagation is a central mechanism for training neural networks, including simple perceptrons. A perceptron is the most basic unit of a neural network, which consists of a linear combination of input features followed by a non-linear activation function. Backpropagation helps in adjusting the weights of these inputs to minimize the error between the predicted and actual outputs.

Understanding Perceptron

A perceptron takes several inputs, passes them through a weighted summation, and finally applies an activation function to produce an output. Mathematically, it can be expressed as:

$y = f\left(\sum\_{i=1}^{n} w\_i x\_i + b\right)$

Where:

• $y$ is the output. • $f$ is the activation function (e.g., sigmoid, hyperbolic tangent). • $w_i$ are the weights. • $x_i$ are the inputs. • $b$ is the bias term.

Error Calculation

The error of the perceptron is determined using a loss function, commonly the mean squared error (MSE) for regression tasks or cross-entropy loss for classification tasks. For MSE, it can be expressed as:

$\text{MSE} = \frac{1}{n} \sum\_{i=1}^{n} (y\_i - \hat{y\_i})^2$

Where:

• $y_i$ is the actual output for $i^{th}$ data point. • $\hat{y_i}$ is the predicted output.

Backpropagation Algorithm

Backpropagation is an optimization technique for neural networks, and it involves two major steps:

1. Forward Pass

In the forward pass, inputs are fed through the network, and outputs are calculated. The output is compared to the actual output using a defined error metric (like MSE).

2. Backward Pass

In the backward pass, the error propagated backwards through the network to update the input weights. The system uses the following steps:

Step 1: Compute the gradient

Calculate the gradient of the loss function concerning each weight, using the chain rule of calculus. This gradient indicates how a slight change in each weight affects the loss. If we consider a single perceptron with weights $\mathbf{w}$ and loss $L$ , the gradient for each weight $w_j$ is computed as:

$\frac{\partial L}{\partial w\_j} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial s} \cdot \frac{\partial s}{\partial w\_j}$

Where:

• $s = \sum_{i=1}^{n} w_i x_i + b$ • $\hat{y} = f(s)$

Step 2: Update Weights

Once the gradient is computed, update the weights using gradient descent. The weight update rule is expressed as:

$w\_j = w\_j - \eta \cdot \frac{\partial L}{\partial w\_j}$

Where $\eta$ is the learning rate, a hyperparameter that controls the step size during optimization.

Examples of Backpropagation

Consider a simple perceptron with inputs $x_1 = 0.5$ and $x_2 = 0.3$ , weights $w_1 = 0.8$ and $w_2 = 0.4$ , bias $b = 0.1$ , and a sigmoid activation function. The predicted output $\hat{y}$ and actual output $y$ are:

• Predicted: $\hat{y} = \sigma(0.8 \times 0.5 + 0.4 \times 0.3 + 0.1) = \sigma(0.74)$ • Actual: $y = 1$

Forward Pass

Calculate the output:

$\hat{y} = \frac{1}{1 + e^{-0.74}} \approx 0.676$

Calculate the error using MSE:

$L = \frac{1}{2}(1 - 0.676)^2 \approx 0.052$

Backward Pass

Calculate the error gradient for $w_1$ :

$\frac{\partial L}{\partial w\_1} = - (1 - 0.676) \cdot 0.676 \cdot (1 - 0.676) \cdot 0.5 \approx 0.0398$

Update weights:

$w\_1 = 0.8 - 0.01 \times 0.0398 \approx 0.7996$

Key Considerations in Backpropagation

• Learning Rate: A crucial hyperparameter in training neural networks. A small learning rate results in slow convergence, while a large learning rate can lead to divergence. • Activation Functions: The choice of activation function affects the performance of a neural network. Sigmoid functions were popular in early days but suffer from the vanishing gradient problem. ReLU and its variants are preferred in deeper networks. • Weight Initialization: Poor weight initialization can lead to slow convergence or getting stuck in local minima.

Summary Table

Steps in Backpropagation	Description
Forward Pass	Compute predicted output by passing inputs forward through network.
Compute `Loss`	Calculate loss between actual and predicted output using a cost function.
Backward Pass	Calculate gradients by propagating loss backwards through network.
Update Weights	Adjust weights using gradient descent.

Understanding and implementing correct backpropagation in a simple perceptron lays the foundation for constructing and training more complex neural networks effectively. As you gain familiarity with the basic concepts and nuances of perceptrons, transitioning to multi-layer networks will be considerably more manageable.