GridSearchCV and RandomizedSearchCV:Python code

3 mins read2.3K Views Comment

Updated on Sep 20, 2022 10:23 IST

Do you want to get good accuracy of your machine learning model? Different methods are there for it. One of the methods is hyperparameter tuning. Many data scientists while doing machine learning implementations deal with hyper-parameter tuning. Now, what does this term mean? Hyperparameters are more like handles available to control the behavior or the output of the algorithm. They are given to algorithms as arguments. The values can be changed and accordingly, the output will change. In this tutorial, we shall introduce GridSearchCV and RandomsearchCV and their implementation in Python.

GridSeachCV–

It is one of the most basic hyper-parameter techniques used. In this, all possible permutations of the hyperparameters for a particular model are used to build models. The performance is evaluated for each model and the best performing one is selected.

RandomizedSearchCV–

In this values for the different hyperparameters are picked up at random from this distribution. Let’s understand its python code. If you want to learn in detail about hyperparameters then u can click here.

About dataset

Problem statement: Diagonizing heart disease.

We have imported ‘heart.csv’ which is available free. This data set is available on Kaggle.com.You can download it from there. This dataset contains data for heart disease patients. This dataset includes the following features

age-age in year
sex
cp-chest pain type
trestbps-resting blood pressure (in mm Hg on admission to the hospital)
chol-serum cholestoral in mg/dl
fbs-(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
restecg-resting electrocardiographic results
thalach-maximum heart rate achieved
exang-exercise induced angina (1 = yes; 0 = no)
oldpeak-ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment — 0: downsloping; 1: flat; 2: upsloping 0: downsloping; 1: flat; 2: upsloping
ca: The number of major vessels (0–3)
thal: A blood disorder called thalassemia Value 0: NULL
target: Heart disease (1 = no, 0= yes)

Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Amity OnlineDegree

Total Fees

₹1.7 L

Duration

2 years

Professional Certificate Course In Generative AI And Machine Learning

IIT KanpurCertificate

Total Fees

₹1.53 L

Duration

11 months

Advance Certification in Applied Data Science, Machine Learning & IoT

IIT GuwahatiCertificate

4.0

Total Fees

₹95 K

Duration

9 months

MCA in Machine Learning Online

Amity OnlineDegree

Total Fees

₹2.5 L

Duration

2 years

IIT Roorkee - Post Graduate Certificate Program in Data Science & Machine Learning (Online)

TimesProCertificate

4.0

Total Fees

₹2 L

Duration

10 months

Data Science & Machine Learning Course

Coding NinjasCertificate

4.8

Total Fees

₹34.65 K

Duration

11 months

MCA in Machine Learning

Amity University Online, NoidaDegree

Total Fees

₹2.5 L

Duration

2 years

M.Sc. in Machine Learning and AI

upGradDegree

Total Fees

₹5.6 L

Duration

18 months

IIT Roorkee & Wiley Post Graduate Certification in AI for BFSI

IIT RoorkeeCertificate

Total Fees

– / –

Duration

6 months

Full Stack Machine Learning & AI Program

Jigsaw AcademyCertificate

Total Fees

– / –

Duration

8 hours

Lets jump to Python code

1. Importing Libraries

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
Copy code

2. Reading the dataset

data = pd.read_csv('heart.csv')
data.head()
Copy code

3. Dependent and independent variables

#Get Target data 
y = data['target']
 
#Load X Variables into a Pandas Dataframe with columns 
X = data.drop(['target'], axis = 1)
Copy code

4. Splitting dataset into Training and Testing Set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=101)
Copy code

Next, we separate the independent predictor variables and the target variable into x and y. And then split both x and y into training and testing sets with the help of the train_test_split() function.

5. Implementing random forest classifier

rf_Model = RandomForestClassifier()
Copy code

Note: You can also implement a random forest regressor also.

6. Fitting the model

model = model.fit(X_train,y_train)
Copy code

7. Checking the accuracy score

print (f'Accuracy - : {model.score(X_test,y_test):.3f}')
Output:
Accuracy - : 0.820
Copy code

We get an accuracy of 82% without any hyperparameter tuning. Now let’s check the accuracy with GridSearchCV and RandomSearchCV.

With GridSearchCV

8. Implementing GridSearchCV

from sklearn.model_selection import GridSearchCV
Copy code

9. Defining the parameters

forest_params = [{'max_depth': list(range(10, 15)), 'max_features': list(range(0,14))}]
 
clf = GridSearchCV(model, forest_params, cv = 10, scoring='accuracy')
Copy code

cv(Cross-validation)=10: means ten splits are there.
max_depth: Max depth of the tree
max_features: Maximum number of features considered while splitting.

10. Fitting GridSearchCV

best_clf = clf.fit(X_train,y_train)
Copy code

11. Finding the best parameters

best_clf.best_estimator_
 
output:
RandomForestClassifier(max_depth=12, max_features=1)
Copy code

Finally, hyperparameter tuning is finding the best combination of hyperparameters that gives the best performance according to the defined scoring metric.

12. Finding accuracy score with GridSearchCV

print (f'Accuracy - : {best_clf.score(X_test,y_test):.3f}')
output:
Accuracy - : 0.852
Copy code

With RandomizedSearchCV

13. Implementing RandomizedSearchCV

from sklearn.model_selection import RandomizedSearchCV
forest_params = [{'max_depth': list(range(10, 25)), 'max_features': list(range(0,15))}]
rf_RandomGrid = RandomizedSearchCV(model, forest_params, cv = 10, scoring='accuracy')
Copy code

14. Fitting RandomSearchCV

rf_RandomGrid.fit(X_train, y_train)
Copy code

15. Finding the best parameters

best_clf.best_estimator_
Output: 
RandomForestClassifier(max_depth=12, max_features=1)
Copy code

16. Checking the accuracy using RandomSearchCV

print (f'Test Accuracy - : {rf_RandomGrid.score(X_test,y_test):.3f}') 
output: 
Test Accuracy - : 0.856
Copy code

Summary of this code

The accuracy is almost the same in both cases.RandomSearchCV has slightly better accuracy. Not all cases will end up like this fortunately for me it ended up like this but you’ll see depending on the dimensions and the model you are using.

Performance comparison

We implemented the Random Forest algorithm without hyperparameter tuning and got the lowest accuracy of 82 %. And then we implemented GridSearchCV and RandomSearchCV and checked the accuracy score with both techniques. We got better accuracies You might ask me now which of these searches are better now it depends on what kind of dimensionality data that you’re using so for smaller ones obviously grid search cv is going to be much better-performing but if you have larger data and larger dimensions of data then the best option to go with is the RandomizedSearchCV in comparison to GridSearchCV because it’s doing less number of combinations to get the best outcome.

Endnotes

In this blog, We took a random forest classifier example because this classifier has more features and checked how each hyperparameter works to alter the decision trees. We implemented the python code for GridSearchCV and RandomizedSearchCV. For a conceptual understanding of hyperparameter tuning, you can read my previous blog which covered this topic from a beginner’s point of view.

If you liked my blog consider hitting the stars below.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

GridSearchCV and RandomizedSearchCV:Python code

GridSeachCV–

RandomizedSearchCV–

About dataset

Best-suited Machine Learning courses for you

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Professional Certificate Course In Generative AI And Machine Learning

Advance Certification in Applied Data Science, Machine Learning & IoT

MCA in Machine Learning Online

IIT Roorkee - Post Graduate Certificate Program in Data Science & Machine Learning (Online)

Data Science & Machine Learning Course

MCA in Machine Learning

M.Sc. in Machine Learning and AI

IIT Roorkee & Wiley Post Graduate Certification in AI for BFSI

Full Stack Machine Learning & AI Program

Lets jump to Python code

1. Importing Libraries

2. Reading the dataset

3. Dependent and independent variables

4. Splitting dataset into Training and Testing Set

5. Implementing random forest classifier

6. Fitting the model

7. Checking the accuracy score

With GridSearchCV

8. Implementing GridSearchCV

9. Defining the parameters

10. Fitting GridSearchCV

11. Finding the best parameters

12. Finding accuracy score with GridSearchCV

With RandomizedSearchCV

13. Implementing RandomizedSearchCV

14. Fitting RandomSearchCV

15. Finding the best parameters

16. Checking the accuracy using RandomSearchCV

Summary of this code

Performance comparison

Endnotes

Top Picks & New Arrivals