Scikit Learn Tutorial

Scikit Learn Tutorial

3 mins read517 Views Comment
Updated on Mar 31, 2022 17:03 IST

Contents

In this article, we’ll learn about the Scikit Library in Python.

2022_03_Feauture-images-naukriE4.jpg

Recommended online courses

Best-suited Python for data science courses for you

Learn Python for data science with these high-rated online courses

Free
4 weeks
12 K
8 hours
4.24 K
6 weeks
40 K
100 hours
4.99 K
– / –
– / –
60 hours
– / –
– / –
– / –
90 hours
1.27 L
12 hours

Introduction

Machine learning is becoming more popular, and businesses all around the world are attempting to harness the potential of data. As a result, a variety of tools and software are being explored and created to make the analysis more simple and straightforward. Python is one of the most popular programming languages among data scientists since it has a large number of libraries and tools for analysis.

Why Scikit-learn Library?

There aren’t many threads on the Internet where we can truly find the reasons why Scikit-learn has become popular among Data Scientists, but it does have certain clear advantages that justify why corporations use and admire Scikit-learn.

A few advantages are listed below:

  • It is a free and open-source Library
  • It is very easy to use
  • Numerous authors, contributors, and a large international online community support and update Scikit-learn
  • Various firms utilise Scikit to anticipate consumer behaviour, identify suspicious actions, and much more.
  • Users who want to connect the algorithms with their platforms will find detailed API documentation on the scikit website.

What is Scikit-learn Library?

Scikit-learn is a free Python machine learning library. It supports Python numerical and scientific libraries like NumPy and SciPy, as well as algorithms like support vector machine (SVM), random forests, and k-neighbors.

Below are a few Datasets that are available inside the Sci-kit learn Library:

  • Iris Dataset
  • Boston House Dataset
  • Digits Dataset
  • Diabetes Dataset
  • Breast Cancer Dataset
  • Wine Recognition Dataset

Installing Scikit Library

Run the below command, and scikit library is installed.

pip install -U scikit-learn
o9

It contains 3 classes (Setosa, Versicolor, and Virginica) with 50 instances each.

The attributes of the dataset are,

  1. sepal length in cm
  2. sepal width in cm
  3. petal length in cm
  4. petal width in cm
  5. class:
    — Iris Setosa
    — Iris Versicolour
    — and Iris Virginica

Then without much delay, Let us start building our model.

Step 1: Import the Required Libraries

import numpy as np
import pandas as pd

Step 2: Read the Data

iris = pd.read_csv("/content/Iris.csv")
iris.head()
o8

Step 3: Check the shape of the data

iris.shape
o7

Step 4: Get the information of the data using info() function

iris.info()
o6

Step 5: Get the description of the data using describe() function

iris.describe()
o5

Step 6: Check the Distinct count of the species

iris['Species'].value_counts()
o4

Step 7: Split the data into training and testing set

Now, Let us try to build some ML models using the scikit learn Library.

from sklearn.model_selection import train_test_split
X= iris.drop(['Id', 'Species'], axis=1)
y=iris['Species']
print(X.shape)
print(y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
o3

Step 8: Build a Randomforest classifier model

from sklearn.ensemble import RandomForestClassifier
model=RandomForestClassifier(n_estimators=100)
model.fit(X_train,y_train)
y_pred=model.predict(X_test)

Step 9: Find out the Accuracy, confusion matrix and classification report for the model

from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix
print("Accuracy of the Model is:",metrics.accuracy_score(y_test, y_pred))
print("Confusion matrix is","

",confusion_matrix(y_test, y_pred),"
")
print("Classification_Report is ","

",classification_report(y_test, y_pred),"
")
o2

Step 10: Check for the important feature as well

import pandas as pd
imp_feature = pd.Series(model.feature_importances_,index=['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']).sort_values(ascending=False)
imp_feature
o1
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Creating a bar plot
sns.barplot(x=imp_feature, y=imp_feature.index)
# Add labels to your graph
plt.xlabel('Important Feature Score')
plt.ylabel('Features')
plt.title("Important Features")
plt.legend()
plt.show()
output

We see that PetalWidthCm is the most important feature.

Learn more about Scikit: Introduction to Data Science and Scikit Learn in Python

Conclusion

The popularity of Machine Learning languages necessitates effective tools, and sklearn in Python fills that demand for both newcomers and those working on supervised learning problems. Scikit is a popular choice among academic and industrial groups for accomplishing a variety of tasks due to its efficiency and versatility.

Top Trending Tech Articles:
Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions

Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio