How inverting the dropout compensates the effect of dropout and keeps expected values unchanged?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
The dropout technique is a popular method used in training neural networks to prevent overfitting. Introduced by Srivastava et al. in 2014, dropout involves randomly setting a proportion of input units to zero during the training phase. Inverting the dropout rule is a technique applied during test time or inference to maintain the network's expected performance as observed during training. This article provides a detailed explanation of how inverting dropout compensates for the effect of dropout and keeps expected values unchanged.
Understanding Dropout
Mechanism of Dropout
Dropout works by randomly "dropping" units in the neural network, that is, setting their activation to zero during each forward pass at training time. This stochastic behavior forces the network to learn redundant representations and encourages features' collaboration. The dropout is usually governed by a parameter p, which represents the probability of retaining a unit, where:
- Probability of retaining a unit =
p - Probability of dropping a unit =
1 - p
Advantages of Dropout
- Reduces Overfitting: By not relying on any single feature, the network's dependency on specific neurons decreases.
- Encourages Robust Features: The network learns more robust features that are useful in the presence of noise.
- Simplifies Ensembling: It simulates training several networks at once due to its stochastic nature.
Inverting Dropout
Addressing Test Time
During inference or test time, dropout is not applied the same way as during training. Invert dropout is implemented to ensure that predictions are stable and consistent with no stochastic noise. This involves scaling the outgoing weights of the neurons by the retention probability p.
Technical Explanation
When training with dropout, each neuron's activation is given by:
Here, r_j is a random mask with a Bernoulli distribution, where r_j is 1 with probability p and 0 with probability 1 - p.
At test time, the expected value of the neuron’s activation without dropout should be:
Inverted Dropout Rule
To achieve this during inference, the weights are scaled (inverted) by p. Therefore, the expression becomes:
This adjustment maintains the expected value of the activations, so the network performs consistently between training and testing.
Practical Example
Consider a network with a hidden layer output of z_j = [0.2, 0.5, 0.8] and a dropout retention probability p = 0.8.
During Training
- Apply dropout: Retained neurons are chosen randomly based on
p. - Example:
[0, 0.5, 0]if the first and last neuron are dropped.
During Testing
- Scale weights:
[0, 0.5, 0] * p = [0, 0.4, 0].
This approach ensures that each neuron's contribution is consistent between the different phases.
Key Points Summary
| Aspect | Description |
| Dropout | Randomly drops units during training to prevent overfitting. |
Retention Probability p | Probability of retaining a unit (applies to neurons) during training. |
| Training with Dropout | Uses scaled activations; introduces noise to training but helps generalization. |
| Inverted Dropout (Scaling) | At test time, scales weights by p to maintain expected output. |
| Expected Values | Maintained across training and testing through inverted dropout strategies. |
Conclusion
Dropout is a powerful regularization technique that helps mitigate overfitting while compelling the network to develop redundant, generalized feature representations. The inversion or scaling of dropout during test time ensures that neural networks can offer consistent and reliable predictions that mirror the learned behavior from the training phase. Understanding and implementing inverted dropout is crucial for those deploying neural networks in production environments, leading to more robust performance.

