Boosting Technique in Ensemble Learning

8 mins read845 Views Comment

Updated on Oct 3, 2023 12:17 IST

The below article covers the Boosting technique in Ensemble Learning.

Theoretically, in Machine Learning, we focus on building one ML model for our application – such as a Naïve Bayes Classifier, Decision Tree Classifier, Linear Regression model, etc. We feed data into the model and train it to predict an outcome or make a decision.

But in the real world, different models perform well for different applications, and it’s mostly not advisable to rely on one specific model to give us the most accurate outcome. So, it is more practical to choose from various models by combining their outputs and obtaining a final model that best fits our application.

This final model is called an Ensemble Model. And the process of combining multiple models is known as Ensemble Learning. In this article, we will discuss a common Ensemble Learning technique in detail – Boosting.

We are going to cover the following sections:

Quick Intro to Ensemble Models
What is Boosting?
How Boosting Works?
- How is a boosting model trained to make predictions?
Adaptive Boosting (AdaBoost)
- Demo: Implementing AdaBoost
Gradient Boosting
- Demo: Performing Gradient Boosting

Quick Intro to Ensemble Models in Machine Learning

An ensemble model is created busing ensemble learning technique. Ensemble learning combining multiple classification and prediction model strategically to enhance their overall performance to solve a particular problem. The base models are generally weak learners, which eventually produce a strong learner or an ensemble model when combined together.

The two most popular ensemble methods are:

Recommended online courses

Best-suited Data Science courses for you

Learn Data Science with these high-rated online courses

Discontinued (July 2024)- Post Graduate Program in Business Analytics and Intelligence (PGP-BA&I)

Amity OnlineCertificate

Total Fees

– / –

Duration

12 months

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Amity OnlineDegree

Total Fees

₹1.7 L

Duration

2 years

Certification in Data Science

MIT School of Distance EducationCertificate

Total Fees

₹80 K

Duration

4 months

Post Graduate Diploma in Big Data Science & Big Data Analysis

IIMT AhmedabadDiploma

Total Fees

₹1.18 L

Duration

12 months

MCA in Machine Learning Online

Amity OnlineDegree

Total Fees

₹2.5 L

Duration

2 years

Master of Science (Data Science)

Chandigarh University (CU)Degree

Total Fees

₹90 K

Duration

24 months

MCA in Machine Learning

Amity University Online, NoidaDegree

Total Fees

₹2.5 L

Duration

2 years

Python Certificate

IIT MadrasCertificate

4.4

Total Fees

Free

Duration

4 weeks

PG Diploma in Artificial Intelligence (PG-DAI)

CDAC - Centre for Development of Advanced ComputingDiploma

4.0

Total Fees

₹1.27 L

Duration

6 weeks

Bachelor of Science in Programming and Data science

IIT MadrasDegree

3.5

Total Fees

₹1.24 L

Duration

48 months

What is Boosting in Ensemble Learning?

Boosting is an ensemble learning method that involves training homogenous weak learners sequentially such that a base model depends on the previously fitted base models. All these base learners are then combined in a very adaptive way to obtain an ensemble model.

In boosting, the ensemble model is the weighted sum of all constituent base learners. There are two meta-algorithms in boosting that differentiate how the base models are aggregated:

Adaptive Boosting (AdaBoost)
Gradient Boosting

How Boosting Works?

Boosting consists of multiple weak learners that are fitted iteratively in a manner that each new learner gives more weight or is only trained with observations that have been poorly classified by the previous learners.

At the end of this process, we obtain a strong learner (ensemble model) with lesser bias than the individual base models composing it. Hence, boosting techniques help avoid the underfitting of the model. So, when a base model usually has low variance but high bias, we will implement boosting techniques. Another reason is that such models are generally less computationally expensive to fit.

Once we have decided upon the type of our base model, we need to ask a few questions:

What information from the previous learners will be considered when fitting the current learner?

How would the learners (base models) be aggregated?

How is a boosting model trained to make predictions?

Samples generated from the training set are assigned the same weight to start with. These samples are used to train a homogeneous weak learner or base model.

The prediction error for a sample is calculated – the greater the error, the weight of the sample increases. Hence, the sample becomes more important for training the next base model.

The individual learner is weighted too – does well on its predictions, gets a higher weight assigned to it. So, a model that outputs good predictions will have a higher say in the final decision.

The weighted data is then passed on to the following base model, and steps 2) and 3) are repeated until the data is fitted well enough to reduce the error below a certain threshold.

When new data is fed into the boosting model, it is passed through all individual base models, and each model makes its own weighted prediction.

Weight of these models is used to generate the final prediction. The predictions are scaled and aggregated to produce a final prediction.

K-means Clustering in Machine Learning

When you are dealing with Machine Learning problems that work with unlabeled training datasets, the most common learning algorithms you will come across are clustering algorithms. Amongst them, the simplest...read more

Read Later

Bagging Technique in Ensemble Learning

In this article, we will discuss the concept how to solve machine learning problems using the ensemble learning bagging.

Read Later

Adaptive Boosting (AdaBoost)

This algorithm updates the weights attached to each of the misclassified training data samples and of the corresponding weak learners.

As discussed above, once an individual base model is trained, the sample next in sequence is assigned a weight that signifies the prediction accuracy (or lack thereof).

The weighted sample is then used to train the next base learner which would intuitively focus more on the samples with greater weight assigned to them and try to make better predictions.

The results would be re-weighted for the misclassified samples and fed into the next individual learner.

Demo: Implementing AdaBoost

Click the below colab icon to run the demo.

Problem Statement:

Let’s build a boosting ensemble model using AdaBoost. For this, we will implement a Decision Tree as the base learner using the scikit-learn library in Python.

Dataset Description:

We are going to make use of the breast_cancer dataset already present in the scikit-learn library.

Here is a preview of the dataset (the full description is too long to include):

Text, letter

Description automatically generated

The target column is used to predict whether a tumor is cancerous or not.

Tasks to be performed:

Load the data
Split the data into training and testing sets
Build a Decision Tree Classifier and get its Accuracy Score
Build an AdaBoost Model and get its Accuracy Score
Compare the Accuracy Scores

Step 1 – Load the data

from sklearn.datasets import load_breast_cancer
 
#Load the breast cancer dataset
x, y = load_breast_cancer(return_X_y=True)
Copy code

Step 2 – Split the data into training and testing sets

from sklearn.model_selection import train_test_split
 
#Split the dataset into 70% training set and 30% testing set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3,   random_state=23)
Copy code

Step 3 – Build a Decision Tree Classifier and get its Accuracy Score

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
 
#Train a Decision tree classifier
dtree = DecisionTreeClassifier(max_depth=1, random_state=23)
dtree.fit(x_train,y_train)
dt_pred = dtree.predict(x_test)
 
dt_acc = round(accuracy_score(y_test,dt_pred),3)
print(f"Decision Tree Classifier Accuracy Score: ", dt_acc)
Copy code

Step 4 – Build an AdaBoost Model and get its Accuracy Score

We will initialize this AdaBoost ensemble model with the following parameters:

base_estimator = Decision Tree (default)
n_estimators = 50 – Create 50 samples to train 50 decision tree base models
learning_rate = 0.6 – Shrinks the contribution of each learner model by the value given

from sklearn.ensemble import AdaBoostClassifier
 
#AdaBoost Model using Decision Tree Classifier
ada = AdaBoostClassifier(n_estimators=50,learning_rate=0.6)
ada.fit(x_train,y_train)
ada_pred = ada.predict(x_test)
 
ada_acc = round(accuracy_score(y_test,ada_pred),3)
print(f"Decision Tree AdaBoost Model Accuracy Score: ", ada_acc)
Copy code

Step 5 – Compare the Accuracy Scores

import numpy as np
import matplotlib.pyplot as plt
 
#Compare the Accuracy Scores through Visualization
plt.figure(figsize=(10,2))
plt.barh(np.arange(2),[dt_acc,ada_acc],
         tick_label=['Decision Tree','AdaBoost'])
Copy code

Thus, we can see how AdaBoost improves the performance and accuracy of the above Decision Tree Classifier.

Gradient Boosting

This algorithm updates the values of the training data samples. Here, the weak learner models are combined sequentially using Gradient Descent. Let’s understand how this is done –

For every data sample, we compute the pseudo-residual. This value is basically the difference between the target and the predicted value.

pseudo_residuals = Yᵗᵃʳᵍᵉᵗ – Yᵖʳᵉᵈ

These residuals indicate the direction in which the successive learner should be updated to get the right value of the data sample.

At first, the pseudo-residuals are set to the average of the known targets.

With each weak learner model, we predict the pseudo-residuals obtained in the previous learner model.

Pseudo-residuals thus obtained are the targets for the following weak learner model.

Demo: Performing Gradient Boosting

Click the below colab icon to run the demo.

Problem Statement:

Let’s build a boosting ensemble model using Gradient Boosting. For this, we will make use of the gender classification dataset available on Kaggle.

Dataset Description:

The dataset has 7 features and a target variable:

longhair – This column contains 0’s and 1’s where 1 is “long hair” and 0 is “not long hair”
foreheadwidthcm – Width of the forehead in centimeters
foreheadheightcm – Height of the forehead in centimeters
nosewide – This column contains 0’s and 1’s where 1 is “wide nose” and 0 is “not wide nose”
noselong – This column contains 0’s and 1’s where 1 is “Long nose” and 0 is “not long nose”
lipsthin – This column contains 0’s and 1’s where 1 represents the “thin lips” while 0 is “Not thin lips”
distancenosetoliplong – This column contains 0’s and 1’s where 1 represents the “long distance between nose and lips” while 0 is “short distance between nose and lips”
gender – “Male” or “Female”

The gender column is the target column used to predict the gender of an individual.

Tasks to be performed:

Read the data
Split the data into training and testing sets
Build a Decision Tree Classifier and get its Accuracy Score
Build a Gradient Boost Model and get its Accuracy Score
Compare the Accuracy Scores

Step 1 – Read the data

import pandas as pd
 
#Read the dataset
data = pd.read_csv('gender_classification_v7.csv')
Copy code

Step 2 – Split the data into training and testing sets

from sklearn.model_selection import train_test_split
 
#Divide the dependent and independent features from the dataframe
x=data.iloc[:,:-1]
y=data.iloc[:,-1]
 
#Divide the datatset into train and test sets keeping 10% for testing
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3, random_state=7)
Copy code

Step 3 – Build a Decision Tree Classifier and get its Accuracy Score

We are going to use a Decision Tree with fixed parameters as the base learner.

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
 
#Train a Decision tree classifier
dtree = DecisionTreeClassifier(max_depth=1, random_state=23)
dtree.fit(x_train,y_train)
dt_pred = dtree.predict(x_test)
 
dt_acc = round(accuracy_score(y_test,dt_pred),3)
print(f"Decision Tree Classifier Accuracy Score: ", dt_acc)
Copy code

Step 4 – Build a Gradient Boost Model and get its Accuracy Score

We will initialize this Gradient Boost ensemble model with the following parameters:

n_estimators = 50 – Create 50 samples to train 50 decision tree base models
learning_rate = 0.6 – Shrinks the contribution of each learner model by the value given

from sklearn.ensemble import GradientBoostingClassifier
 
#Gradient Boost Model using Decision Tree Classifier
gb = GradientBoostingClassifier(n_estimators=1000, learning_rate=0.5)
gb.fit(x_train,y_train)
gb_pred = gb.predict(x_test)
 
gb_acc = round(accuracy_score(y_test,gb_pred),3)
print(f"Decision Tree Gradient Boost Model Accuracy Score: ", gb_acc)
Copy code

Step 5 – Compare the Accuracy Scores

import numpy as np
import matplotlib.pyplot as plt
 
#Visualize the Accuracy Scores
plt.figure(figsize=(10,2))
plt.barh(np.arange(2),[dt_acc,gb_acc],
         tick_label=['Decision Tree','Gradient Boost'])
Copy code

Thus, we can see how Gradient Boosting improves the performance and accuracy of the above Decision Tree Classifier.

Endnotes

We have discussed how ensemble learning techniques aim at optimizing an ML model by alleviating the overfitting/underfitting problems. Artificial Intelligence & Machine Learning is an increasingly growing domain that has hugely impacted big businesses worldwide. Interested in being a part of this frenzy? Explore related articles here.

Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Boosting Technique in Ensemble Learning

Quick Intro to Ensemble Models in Machine Learning

Best-suited Data Science courses for you

Discontinued (July 2024)- Post Graduate Program in Business Analytics and Intelligence (PGP-BA&I)

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Certification in Data Science

Post Graduate Diploma in Big Data Science & Big Data Analysis

MCA in Machine Learning Online

Master of Science (Data Science)

MCA in Machine Learning

Python Certificate

PG Diploma in Artificial Intelligence (PG-DAI)

Bachelor of Science in Programming and Data science

What is Boosting in Ensemble Learning?

How Boosting Works?

How is a boosting model trained to make predictions?

Adaptive Boosting (AdaBoost)

Demo: Implementing AdaBoost

Problem Statement:

Dataset Description:

Tasks to be performed:

Step 1 – Load the data

Step 2 – Split the data into training and testing sets

Step 3 – Build a Decision Tree Classifier and get its Accuracy Score

Step 4 – Build an AdaBoost Model and get its Accuracy Score

Step 5 – Compare the Accuracy Scores

Gradient Boosting

Demo: Performing Gradient Boosting

Problem Statement:

Dataset Description:

Tasks to be performed:

Step 1 – Read the data

Step 2 – Split the data into training and testing sets

Step 3 – Build a Decision Tree Classifier and get its Accuracy Score

Step 4 – Build a Gradient Boost Model and get its Accuracy Score

Step 5 – Compare the Accuracy Scores

Endnotes

Top Picks & New Arrivals