Boosting Technique in Ensemble Learning

Boosting Technique in Ensemble Learning

8 mins read845 Views Comment
Updated on Oct 3, 2023 12:17 IST

The below article covers the Boosting technique in Ensemble Learning.

2022_04_boosting.jpg

Theoretically, in Machine Learning, we focus on building one ML model for our application – such as a Naïve Bayes Classifier, Decision Tree Classifier, Linear Regression model, etc. We feed data into the model and train it to predict an outcome or make a decision.

But in the real world, different models perform well for different applications, and it’s mostly not advisable to rely on one specific model to give us the most accurate outcome. So, it is more practical to choose from various models by combining their outputs and obtaining a final model that best fits our application.

This final model is called an Ensemble Model. And the process of combining multiple models is known as Ensemble Learning. In this article, we will discuss a common Ensemble Learning technique in detail – Boosting

We are going to cover the following sections:

Quick Intro to Ensemble Models in Machine Learning

An ensemble model is created busing ensemble learning technique. Ensemble learning combining multiple classification and prediction model strategically to enhance their overall performance to solve a particular problem. The base models are generally weak learners, which eventually produce a strong learner or an ensemble model when combined together. 

The two most popular ensemble methods are:

Recommended online courses

Best-suited Data Science courses for you

Learn Data Science with these high-rated online courses

80 K
4 months
1.18 L
12 months
2.5 L
2 years
90 K
24 months
2.5 L
2 years
Free
4 weeks
1.24 L
48 months

What is Boosting in Ensemble Learning?

Boosting is an ensemble learning method that involves training homogenous weak learners sequentially such that a base model depends on the previously fitted base models. All these base learners are then combined in a very adaptive way to obtain an ensemble model. 

In boosting, the ensemble model is the weighted sum of all constituent base learners. There are two meta-algorithms in boosting that differentiate how the base models are aggregated:

  1. Adaptive Boosting (AdaBoost)
  2. Gradient Boosting

How Boosting Works?

Boosting consists of multiple weak learners that are fitted iteratively in a manner that each new learner gives more weight or is only trained with observations that have been poorly classified by the previous learners.

At the end of this process, we obtain a strong learner (ensemble model) with lesser bias than the individual base models composing it. Hence, boosting techniques help avoid the underfitting of the model. So, when a base model usually has low variance but high bias, we will implement boosting techniques. Another reason is that such models are generally less computationally expensive to fit.

Once we have decided upon the type of our base model, we need to ask a few questions:

  • What information from the previous learners will be considered when fitting the current learner?
  • How would the learners (base models) be aggregated?

How is a boosting model trained to make predictions?

Diagram
  1. Samples generated from the training set are assigned the same weight to start with. These samples are used to train a homogeneous weak learner or base model.
  1. The prediction error for a sample is calculated – the greater the error, the weight of the sample increases. Hence, the sample becomes more important for training the next base model.
  1. The individual learner is weighted too – does well on its predictions, gets a higher weight assigned to it. So, a model that outputs good predictions will have a higher say in the final decision.
  1. The weighted data is then passed on to the following base model, and steps 2) and 3) are repeated until the data is fitted well enough to reduce the error below a certain threshold.
  1. When new data is fed into the boosting model, it is passed through all individual base models, and each model makes its own weighted prediction.
  1. Weight of these models is used to generate the final prediction. The predictions are scaled and aggregated to produce a final prediction.
K-means Clustering in Machine Learning
Bagging Technique in Ensemble Learning

Adaptive Boosting (AdaBoost)

This algorithm updates the weights attached to each of the misclassified training data samples and of the corresponding weak learners.

  • As discussed above, once an individual base model is trained, the sample next in sequence is assigned a weight that signifies the prediction accuracy (or lack thereof).
  • The weighted sample is then used to train the next base learner which would intuitively focus more on the samples with greater weight assigned to them and try to make better predictions.
  • The results would be re-weighted for the misclassified samples and fed into the next individual learner.

Demo: Implementing AdaBoost

Click the below colab icon to run the demo.

google-collab

Problem Statement:

Let’s build a boosting ensemble model using AdaBoost. For this, we will implement a Decision Tree as the base learner using the scikit-learn library in Python. 

Dataset Description:

We are going to make use of the breast_cancer dataset already present in the scikit-learn library.

Here is a preview of the dataset (the full description is too long to include):

Text, letter

Description automatically generated

The target column is used to predict whether a tumor is cancerous or not.

Tasks to be performed:

  1. Load the data
  2. Split the data into training and testing sets
  3. Build a Decision Tree Classifier and get its Accuracy Score
  4. Build an AdaBoost Model and get its Accuracy Score
  5. Compare the Accuracy Scores

Step 1 – Load the data

 
from sklearn.datasets import load_breast_cancer
#Load the breast cancer dataset
x, y = load_breast_cancer(return_X_y=True)
Copy code

Step 2 – Split the data into training and testing sets

 
from sklearn.model_selection import train_test_split
#Split the dataset into 70% training set and 30% testing set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=23)
Copy code

Step 3 – Build a Decision Tree Classifier and get its Accuracy Score

 
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
#Train a Decision tree classifier
dtree = DecisionTreeClassifier(max_depth=1, random_state=23)
dtree.fit(x_train,y_train)
dt_pred = dtree.predict(x_test)
dt_acc = round(accuracy_score(y_test,dt_pred),3)
print(f"Decision Tree Classifier Accuracy Score: ", dt_acc)
Copy code

Step 4 – Build an AdaBoost Model and get its Accuracy Score

We will initialize this AdaBoost ensemble model with the following parameters:

  • base_estimator = Decision Tree (default)
  • n_estimators = 50 – Create 50 samples to train 50 decision tree base models
  • learning_rate = 0.6 – Shrinks the contribution of each learner model by the value given
 
from sklearn.ensemble import AdaBoostClassifier
#AdaBoost Model using Decision Tree Classifier
ada = AdaBoostClassifier(n_estimators=50,learning_rate=0.6)
ada.fit(x_train,y_train)
ada_pred = ada.predict(x_test)
ada_acc = round(accuracy_score(y_test,ada_pred),3)
print(f"Decision Tree AdaBoost Model Accuracy Score: ", ada_acc)
Copy code

Step 5 – Compare the Accuracy Scores

 
import numpy as np
import matplotlib.pyplot as plt
#Compare the Accuracy Scores through Visualization
plt.figure(figsize=(10,2))
plt.barh(np.arange(2),[dt_acc,ada_acc],
tick_label=['Decision Tree','AdaBoost'])
Copy code
chart1

Thus, we can see how AdaBoost improves the performance and accuracy of the above Decision Tree Classifier.

Gradient Boosting

This algorithm updates the values of the training data samples. Here, the weak learner models are combined sequentially using Gradient Descent. Let’s understand how this is done –

For every data sample, we compute the pseudo-residual. This value is basically the difference between the target and the predicted value. 

pseudo_residuals = Yᵗᵃʳᵍᵉᵗ – Yᵖʳᵉᵈ

These residuals indicate the direction in which the successive learner should be updated to get the right value of the data sample. 

  • At first, the pseudo-residuals are set to the average of the known targets.
  • With each weak learner model, we predict the pseudo-residuals obtained in the previous learner model. 
  • Pseudo-residuals thus obtained are the targets for the following weak learner model.

Demo: Performing Gradient Boosting

Click the below colab icon to run the demo.

google-collab

Problem Statement:

Let’s build a boosting ensemble model using Gradient Boosting. For this, we will make use of the gender classification dataset available on Kaggle.

Dataset Description:

The dataset has 7 features and a target variable:

  • longhair – This column contains 0’s and 1’s where 1 is “long hair” and 0 is “not long hair”
  • foreheadwidthcm – Width of the forehead in centimeters
  • foreheadheightcm – Height of the forehead in centimeters
  • nosewide – This column contains 0’s and 1’s where 1 is “wide nose” and 0 is “not wide nose”
  • noselong – This column contains 0’s and 1’s where 1 is “Long nose” and 0 is “not long nose”
  • lipsthin – This column contains 0’s and 1’s where 1 represents the “thin lips” while 0 is “Not thin lips”
  • distancenosetoliplong – This column contains 0’s and 1’s where 1 represents the “long distance between nose and lips” while 0 is “short distance between nose and lips”
  • gender – “Male” or “Female”

The gender column is the target column used to predict the gender of an individual.

Tasks to be performed:

  1. Read the data
  2. Split the data into training and testing sets
  3. Build a Decision Tree Classifier and get its Accuracy Score
  4. Build a Gradient Boost Model and get its Accuracy Score
  5. Compare the Accuracy Scores

Step 1 – Read the data

 
import pandas as pd
#Read the dataset
data = pd.read_csv('gender_classification_v7.csv')
Copy code

Step 2 – Split the data into training and testing sets

 
from sklearn.model_selection import train_test_split
#Divide the dependent and independent features from the dataframe
x=data.iloc[:,:-1]
y=data.iloc[:,-1]
#Divide the datatset into train and test sets keeping 10% for testing
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3, random_state=7)
Copy code

Step 3 – Build a Decision Tree Classifier and get its Accuracy Score

We are going to use a Decision Tree with fixed parameters as the base learner. 

 
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
#Train a Decision tree classifier
dtree = DecisionTreeClassifier(max_depth=1, random_state=23)
dtree.fit(x_train,y_train)
dt_pred = dtree.predict(x_test)
dt_acc = round(accuracy_score(y_test,dt_pred),3)
print(f"Decision Tree Classifier Accuracy Score: ", dt_acc)
Copy code

Step 4 – Build a Gradient Boost Model and get its Accuracy Score

We will initialize this Gradient Boost ensemble model with the following parameters:

  • n_estimators = 50 – Create 50 samples to train 50 decision tree base models
  • learning_rate = 0.6 – Shrinks the contribution of each learner model by the value given
 
from sklearn.ensemble import GradientBoostingClassifier
#Gradient Boost Model using Decision Tree Classifier
gb = GradientBoostingClassifier(n_estimators=1000, learning_rate=0.5)
gb.fit(x_train,y_train)
gb_pred = gb.predict(x_test)
gb_acc = round(accuracy_score(y_test,gb_pred),3)
print(f"Decision Tree Gradient Boost Model Accuracy Score: ", gb_acc)
Copy code

Step 5 – Compare the Accuracy Scores

 
import numpy as np
import matplotlib.pyplot as plt
#Visualize the Accuracy Scores
plt.figure(figsize=(10,2))
plt.barh(np.arange(2),[dt_acc,gb_acc],
tick_label=['Decision Tree','Gradient Boost'])
Copy code
Chart2

Thus, we can see how Gradient Boosting improves the performance and accuracy of the above Decision Tree Classifier.

Endnotes

We have discussed how ensemble learning techniques aim at optimizing an ML model by alleviating the overfitting/underfitting problems. Artificial Intelligence & Machine Learning is an increasingly growing domain that has hugely impacted big businesses worldwide. Interested in being a part of this frenzy? Explore related articles here.


Top Trending Tech Articles:
Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions

Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio