Machine learning is becoming more popular, and businesses all around the world are attempting to harness the potential of data. As a result, a variety of tools and software are being explored and created to make the analysis more simple and straightforward. Python is one of the most popular programming languages among data scientists since it has a large number of libraries and tools for analysis.

Why Scikit-learn Library?

There aren’t many threads on the Internet where we can truly find the reasons why Scikit-learn has become popular among Data Scientists, but it does have certain clear advantages that justify why corporations use and admire Scikit-learn.

A few advantages are listed below:

It is a free and open-source Library
It is very easy to use
Numerous authors, contributors, and a large international online community support and update Scikit-learn
Various firms utilise Scikit to anticipate consumer behaviour, identify suspicious actions, and much more.
Users who want to connect the algorithms with their platforms will find detailed API documentation on the scikit website.

What is Scikit-learn Library?

Scikit-learn is a free Python machine learning library. It supports Python numerical and scientific libraries like NumPy and SciPy, as well as algorithms like support vector machine (SVM), random forests, and k-neighbors.

Below are a few Datasets that are available inside the Sci-kit learn Library:

Iris Dataset
Boston House Dataset
Digits Dataset
Diabetes Dataset
Breast Cancer Dataset
Wine Recognition Dataset

Installing Scikit Library

Run the below command, and scikit library is installed.

pip install -U scikit-learn

It contains 3 classes (Setosa, Versicolor, and Virginica) with 50 instances each.

The attributes of the dataset are,

sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class:
— Iris Setosa
— Iris Versicolour
— and Iris Virginica

Then without much delay, Let us start building our model.

Step 1: Import the Required Libraries

import numpy as np
import pandas as pd

Step 2: Read the Data

iris = pd.read_csv("/content/Iris.csv")
iris.head()

Step 3: Check the shape of the data

iris.shape

Step 4: Get the information of the data using info() function

iris.info()

Step 5: Get the description of the data using describe() function

iris.describe()

Step 6: Check the Distinct count of the species

iris['Species'].value_counts()

Step 7: Split the data into training and testing set

Now, Let us try to build some ML models using the scikit learn Library.

from sklearn.model_selection import train_test_split
X= iris.drop(['Id', 'Species'], axis=1)
y=iris['Species']
print(X.shape)
print(y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)

Step 8: Build a Randomforest classifier model

from sklearn.ensemble import RandomForestClassifier
model=RandomForestClassifier(n_estimators=100)
model.fit(X_train,y_train)
y_pred=model.predict(X_test)

Step 9: Find out the Accuracy, confusion matrix and classification report for the model

from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix
print("Accuracy of the Model is:",metrics.accuracy_score(y_test, y_pred))
print("Confusion matrix is","

",confusion_matrix(y_test, y_pred),"
")
print("Classification_Report is ","

",classification_report(y_test, y_pred),"
")

Step 10: Check for the important feature as well

import pandas as pd
imp_feature = pd.Series(model.feature_importances_,index=['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']).sort_values(ascending=False)
imp_feature

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Creating a bar plot
sns.barplot(x=imp_feature, y=imp_feature.index)
# Add labels to your graph
plt.xlabel('Important Feature Score')
plt.ylabel('Features')
plt.title("Important Features")
plt.legend()
plt.show()

We see that PetalWidthCm is the most important feature.

Learn more about Scikit: Introduction to Data Science and Scikit Learn in Python

Conclusion

The popularity of Machine Learning languages necessitates effective tools, and sklearn in Python fills that demand for both newcomers and those working on supervised learning problems. Scikit is a popular choice among academic and industrial groups for accomplishing a variety of tasks due to its efficiency and versatility.

Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Scikit Learn Tutorial

Contents

Best-suited Python for data science courses for you

Python for data science

Data Analysis with Python for Managers (with Live Project)

Data Science using Python

Data Science Online Training

Certificate Program in Data Science for Finance (CPDSF)

Online Course Data Science with Python

Certified Professional Diploma in Data Science

DATA SCIENCE COURSE USING PYTHON.

Python

Introduction to Python for Data Science and Data Engineering

Introduction

Why Scikit-learn Library?

What is Scikit-learn Library?

Installing Scikit Library

Conclusion

Top Picks & New Arrivals