Dying ReLU: Causes and Solutions (Leaky ReLU)

Home » News » Dying ReLU: Causes and Solutions (Leaky ReLU)
ReLU (Rectified Linear Unit) is a widely used activation function in a neural network which outputs zero if the input is negative or zero and outputs the same value if the input is positive. 


Mathematically, relu(z) = max(0, z)


For more details on ReLU and other activation functions, you can visit my this post on activation functions in neural networks.


What is a Dying ReLU?


The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. So, once a neuron gets negative input, it will always output zero and is unlikely for it to recover. It will become inactive forever. Such neurons will not play any role in discriminating the input and become useless in the neural network. If this process continues, over the time you may end up with a large part of your network doing nothing.


What is the cause of Dying ReLU?


Lets see why dying ReLU problem occurs? The dying ReLU problem is likely to occur when:


1. Learning rate is too high or 
2. There is a large negative bias.


Consider the following statement which is used to calculate the new weights during back-propagation:


New Weight = Old Weight – (Derivative of Loss Function * Learning Rate) + Bias


So, if the learning rate is too high, we may end up with a new weight which is negative. Also, if the bias is too negative, we may again end up in negative weight. 


Once it becomes negative, ReLU activation function of that neuron will never be activated which will lead that neuron to die forever.


What is the solution of Dying ReLU?


Leaky ReLU is the most common and effective method to alleviate a dying ReLU. It adds a slight slope in the negative range to prevent the dying ReLU issue.




























Leaky ReLU has a small slope for negative values, instead of altogether zero. For example, leaky ReLU may have y = 0.0001x when x < 0.


Parametric ReLU (PReLU) is a type of leaky ReLU that, instead of having a predetermined slope like 0.0001, makes it a parameter for the neural network to figure out itself: y = αx when x < 0.














Lower learning rates often mitigates the problem. 

Leave a Reply

Your email address will not be published. Required fields are marked *

New Providers
Binolla

The Broker
More then 2 million businesses
See Top 10 Broker

gamehag

Online game
More then 2 million businesses
See Top 10 Free Online Games

New Games
Lies of P

$59.99 Standard Edition
28% Save Discounts
See Top 10 Provider Games

COCOON

$24.99 Standard Edition
28% Save Discounts
See Top 10 Provider Games

New Offers
Commission up to $1850 for active user of affiliate program By Exness

Top Points © Copyright 2023 | By Topoin.com Media LLC.
Topoin.info is a site for reviewing the best and most trusted products, bonus, offers, business service providers and companies of all time.

Discover more from Top Points

Subscribe now to keep reading and get access to the full archive.

Continue reading