Listly by mohitverma0491
Hi Everyone. After my last list, I am making another list of the various types of activation functions. I will cover many activation functions in this post, including but not limited to the older activation functions.
ReLU produces an output which is maximum among 0 and x. So when x is negative, the output is 0 and when x is positive, the output is x.
Leaky ReLU Activation Function. It is quite similar to the ReLU activation function, except that it just has a small leak.
ELU Activation Function. ELU, also know as Exponential Linear Unit is an activation function which is somewhat similar to the ReLU with some differences.
Swish Activation Function. Swish is one of the new activation function which was first proposed in 2017 and was found to outperform ReLU and its variants.
Sigmoid activation function, also known as logistic function is one of the activation functions used in the neural network.
Most people have never heard about swish, and if you are like most people, you probably would have never heard about swish too. Swish is one of the new activation functions suggested by scientists at Google. It was searched by using reinforcement learning. It is very much similar to the ReLU activation function and its variants. However, when tested on standard datasets, it was found to outperform the RELU activation function and its variants, like ReLU, leaky ReLU, parameterized ReLU. Along with this, it also outperformed activation functions similar to or inspired by ReLU. Such as ELU, Softplus, and the like.
Swish is defined as the product of x and sigmoid of x. So for large enough values of x, sigmoid of x is approximately 1, and the output is approximately equal to x. Similarly for large enough negative values, the value of sigmoid of x is approximately equal to 0, and hence the value of swish activation function approaches 0 too.
Swish is continuous and differentiable at all points. However, it is not a monotonous function, but its derivative is a monotonous function.
ELU Stands for the Exponential Linear Unit. It is very much similar to the ReLU activation function. For values greater than 0, the output of the ELU is the same as that of the ReLU function. However, for values lesser than 0, the value is Exp(x)-1. The reason its accuracy is better than other activation functions is that the mean of the ELU activation function is closer to 0. This effect is similar to that of batch normalization. It is further continuous at all points and also differentiable at all points. It is monotonous.
To learn more about these activation functions, and about more amazing things such as neo4j graph database, regular blogs, and tutorials, please visit our site-
Deep Learning University
In our future posts, we will be talking about various other activation functions, which are used in deep learning, these include the softplus, softmax, hyperbolic tangent, mish, among others. Stay tuned for more amazing news, blog posts, tutorials, and facts.