Activation Functions: With Real-life analogy and Python Code
In this article you will learn about Activation Functions with real life analogy .You will also get answer that why they are needed and what their types.
Activation functions are an important component of neural networks. They help to determine the output of a neural network by applying mathematical transformations to the input signals received from other layers in a network. Activation functions allow for complex non-linear relationships between input and output data points.
The choice of which function is appropriate depends largely upon the problem one is trying to solve with their model and any constraints imposed by hardware capabilities or time/space limitations. However this article will guide you on which activation function to use and when. You will also learn about activation functions with real-life analogy.
Table of contents
- What are activation functions?
- Real-life analogy for activation functions
- Why use activation functions?
- Types of activation functions
- How to choose activation functions?
- Activation function Python code
- Summary chart
- Summary table
Best-suited Deep Learning and Neural Networks courses for you
Learn Deep Learning and Neural Networks with these high-rated online courses
What are Activation Functions?
A neural network activation function is a function that introduces nonlinearity into the model.
A neural network has multiple nodes in each layer, and in a fully connected network, every node in one layer is connected to every node in the next layer. First, let’s look at computing the value of the first neuron in the second layer. Each neuron in the first layer is multiplied by a weight (the weight is learned by training), the multiplied values are added, and the sum is added to the bias (the bias is also learned).
Learn more – What is Deep Learning?
Explore DL courses
Real-life analogy for Activation Functions
Imagine that a neural network is a hose:
It takes to water (takes some input), carries it somewhere (modifies your input), and pushes the water out (produces some output).
Without an activation function, your hose will act more like a steel pipe: fixed and inflexible. Sometimes that’s good enough. Nothing wrong with using a pipe to deliver your water:
A rigid steel pipe won’t fit, no matter how you rotate it. An activation function is handy here because it allows your function to be more flexible.
In this case, a neural net with an activation function would act like a plastic garden hose. You can bend it to your specific needs and carry your water to a lot more places that are impossible to reach with a steel pipe:
So, the purpose of an activation function is to add flexibility to your hose (nonlinearity to your neural net).
Free Deep Learning Courses from Top e-learning Platforms
Why use activation functions?
1. Activation functions’ main objective is to add non-linearities into the network so that it can model more intricate and varied interactions between inputs and outputs. In the absence of activation functions, the network would only be capable of performing linear transformations, which cannot adequately represent the complexity and nuances of real-world data. Since neural networks need to implement complex mapping functions, non-linear activation functions must be used to introduce the much-needed nonlinearity property that allows approximating any function.
2. Normalizing each neuron in the network’s output is a key benefit of utilizing activation functions. Depending on the inputs it gets and the weights associated to those inputs, a neuron’s output can range from extremely high to extremely low. Activation functions make ensuring that each neuron’s output falls inside a defined range, which makes it simpler to optimise the network during training.
Types of Activation Functions
Sigmoid activation function
The sigmoid activation function is a mathematical function used in artificial neural networks to classify information. It maps any input onto a value between 0 and 1, which can then be interpreted as either true or false. A common example of this is when an image recognition system needs to decide whether an object in the image is a cat or not; if the output from the sigmoid activation function for that particular object exceeds 0.5, then it’s classified as being “cat-like”; otherwise, it isn’t. The advantage of using this activation function over others lies in its ability to smooth out data points so that small variations don’t affect results too much – making predictions more reliable overall.
Note: This function suffers from vanishing gradient problems.
Tanh activation function
The tanh (hyperbolic tangent) activation functions are similar but have some distinct differences compared with sigmoid: instead of mapping inputs onto values between -1 and 1 rather than just 0/1 like Sigmoid do – allowing for more nuance when classifying objects into categories based on their similarity scores across all features considered by the network at once (i.e., multi-dimensional classification). Tanh also has better gradient properties than Sigmoid functions – meaning they allow faster learning rates during training because gradients can be propagated back through layers with less distortion due to curvature effects along each axis (as opposed to flat lines like those produced by Sigmoid). This makes them ideal for deep learning applications where accuracy matters most!
Note: This function suffers from vanishing gradient problems.
Softmax activation function
Softmax functions are often written as a combination of multiple sigmoid. We know that Sigmoid returns a value between 0 and 1. This can be treated as the probability of a data point belonging to a particular class. Therefore, sigmoid are often used for binary classification problems.
The softmax function can be used for multiclass classification problems. This function returns the probability of a data point belonging to each unique class. Here is the formula for the same −
ReLU activation function
In the case of hidden layers, Relu is the most effective option to use. It is computationally very effective. It also suffers from a vanishing gradient problem as if the value is less than 0. Then the output will be 0 means constant.
Note: If you need more clarification about your choice of activation function, especially for hidden layers, go for the Relu function.
Leaky ReLU activation function
Leaky ReLU is the most popular and effective way to solve the dying ReLU problem. Adds a small slope(as shown in fig.) in the negative direction to prevent ReLU problems from disappearing. Leaky Relu is a variant of ReLU. Instead of being 0 for z < 0, leaky ReLUs allow a small constant non-zero gradient α (typically α = 0.01).
Exponential linear units (ELU)
The Exponential Linear Unit (ELU) function is an AF that is also used to speed up the training of neural networks (similar to the ReLU function). The main advantage of the ELU function is that using identities for positive values solves the vanishing gradient problem and improves the learning properties of the model.
Where” “is the ELU hyperparameter, which is normally set to 1.0, and controls the saturation point for net negative inputs. The ELU function does have one drawback, though. not centred on zero.
ELU has a negative value, which brings the average unit activation closer to zero, reduces computational complexity, and improves learning speed. ELU is a great alternative to ReLU. Reduce the bias shift by bringing the average activation closer to zero during training.
How to choose activation functions?
Consideration | Activation Function |
Non-linearity | Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, SELU |
Derivability | Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, SELU |
Range of output values | Sigmoid, Softmax |
Computational efficiency | ReLU, Leaky ReLU, ELU, SELU |
Saturation | ReLU, Leaky ReLU, ELU, SELU |
Other points to remember
- If the network is being used for binary classification, a sigmoid function with an output range between 0 and 1 would be suitable.
- For multiclass classification-Softmax activation function.
- For other tasks such as anomaly detection, recommendation systems, or reinforcement learning, other activation functions such as the ReLU or the tanh functions may be used, depending on the specifics of the problem.
- Some activation functions, such as sigmoid and tanh, may saturate at extreme values, leading to slower learning. In such cases, it may be better to use a function that does not saturate, such as ReLU.
- For the hidden layer the best choice would be ReLU
Note: Other activation functions are available besides those listed here, and the choice of the optimal activation function may depend on the specific problem and neural network architecture.
Activation function Python code
import numpy as np
def sigmoid(x): return 1 / (1 + np.exp(-x))
def tanh(x): return np.tanh(x)
def relu(x): return np.maximum(0, x)
def leaky_relu(x, alpha=0.01): return np.maximum(alpha * x, x)
def softmax(x): exp_x = np.exp(x) return exp_x / np.sum(exp_x)
These functions use the NumPy library to perform element-wise operations on arrays. Here are some examples of how to use these functions:
x = np.array([-1, 0, 1])
# Sigmoidprint(sigmoid(x)) # [0.26894142 0.5 0.73105858]
# Tanhprint(tanh(x)) # [-0.76159416 0. 0.76159416]
# ReLUprint(relu(x)) # [0 0 1]
# Leaky ReLUprint(leaky_relu(x)) # [-0.01 0. 1. ]
# Softmaxprint(softmax(x)) # [0.09003057 0.24472847 0.66524096]
Output
[0.26894142 0.5 0.73105858]
[-0.76159416 0. 0.76159416]
[0 0 1]
[-0.01 0. 1. ]
[0.09003057 0.24472847 0.66524096]
Code explanation
- Sigmoid function: The sigmoid function is a widely used tool for binary classification problems, where it maps any input value to a value between 0 and 1. This allows us to interpret the output as representing the probability of a positive class.
- tanh function: It is similar in structure but produces values with larger range, mapping inputs between -1 and 1.
- ReLU (Rectified Linear Unit) functions: They return the maximum of input or zero if it is negative. A variant called leaky ReLU prevents neurons from always outputting zero by returning either their inputs or their inputs multiplied by a small positive constant instead when negative values are encountered.
- Softmax function: It returns a probability distribution over all possible classes given its inputs; this can be useful for multiclass classification tasks such as image recognition, where multiple objects could appear in one image simultaneously.
Summary chart
Summary table
Activation Function | Plot | Equation | Derivative |
Sigmoid | f(x) = 1 / (1 + e^(-x)) | f'(x) = f(x) * (1 – f(x)) | |
Softmax | f(x_i) = e^(x_i) / sum(e^(x_j)), for all j | The derivative is complicated and depends on the function used | |
Tanh | f(x) = (e^(x) – e^(-x)) / (e^(x) + e^(-x)) | f'(x) = 1 – f(x)^2 | |
ReLU | f(x) = max(0, x) | f'(x) = 1 if x > 0; 0 otherwise | |
Leaky ReLU | f(x) = max(0.01x, x) | f'(x) = 1 if x > 0; 0.01 otherwise |
Note: In the Softmax activation function, “i” and “j” denote the input and output dimensions of the function.
Conclusion
An activation function is very important in solving complicated problems. This is necessary for our model to perform well on non-linear data problems. We tried to describe all important activation functions using mathematical formulas and Python code. If this helps you and you want to learn more about such concepts, please motivate us by liking and sharing with your friends.
KEEP LEANING!!!
KEEP SHARING!!!
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio