Dying ReLU: Causes and Solutions (Leaky ReLU)

Home » News » Dying ReLU: Causes and Solutions (Leaky ReLU)

7 years ago 0 125

ReLU (Rectified Linear Unit) is a widely used activation function in a neural network which outputs zero if the input is negative or zero and outputs the same value if the input is positive.

Mathematically, relu(z) = max(0, z)

For more details on ReLU and other activation functions, you can visit my this post on activation functions in neural networks.

What is a Dying ReLU?

The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. So, once a neuron gets negative input, it will always output zero and is unlikely for it to recover. It will become inactive forever. Such neurons will not play any role in discriminating the input and become useless in the neural network. If this process continues, over the time you may end up with a large part of your network doing nothing.

What is the cause of Dying ReLU?

Lets see why dying ReLU problem occurs? The dying ReLU problem is likely to occur when:

1. Learning rate is too high or
2. There is a large negative bias.

Consider the following statement which is used to calculate the new weights during back-propagation:

New Weight = Old Weight – (Derivative of Loss Function * Learning Rate) + Bias

So, if the learning rate is too high, we may end up with a new weight which is negative. Also, if the bias is too negative, we may again end up in negative weight.

Once it becomes negative, ReLU activation function of that neuron will never be activated which will lead that neuron to die forever.

What is the solution of Dying ReLU?

Leaky ReLU is the most common and effective method to alleviate a dying ReLU. It adds a slight slope in the negative range to prevent the dying ReLU issue.

Leaky ReLU has a small slope for negative values, instead of altogether zero. For example, leaky ReLU may have y = 0.0001x when x < 0.

Parametric ReLU (PReLU) is a type of leaky ReLU that, instead of having a predetermined slope like 0.0001, makes it a parameter for the neural network to figure out itself: y = αx when x < 0.