Scikit Learn Tutorial
Contents
In this article, we’ll learn about the Scikit Library in Python.
Best-suited Python for data science courses for you
Learn Python for data science with these high-rated online courses
Introduction
Machine learning is becoming more popular, and businesses all around the world are attempting to harness the potential of data. As a result, a variety of tools and software are being explored and created to make the analysis more simple and straightforward. Python is one of the most popular programming languages among data scientists since it has a large number of libraries and tools for analysis.
Why Scikit-learn Library?
There aren’t many threads on the Internet where we can truly find the reasons why Scikit-learn has become popular among Data Scientists, but it does have certain clear advantages that justify why corporations use and admire Scikit-learn.
A few advantages are listed below:
- It is a free and open-source Library
- It is very easy to use
- Numerous authors, contributors, and a large international online community support and update Scikit-learn
- Various firms utilise Scikit to anticipate consumer behaviour, identify suspicious actions, and much more.
- Users who want to connect the algorithms with their platforms will find detailed API documentation on the scikit website.
What is Scikit-learn Library?
Scikit-learn is a free Python machine learning library. It supports Python numerical and scientific libraries like NumPy and SciPy, as well as algorithms like support vector machine (SVM), random forests, and k-neighbors.
Below are a few Datasets that are available inside the Sci-kit learn Library:
- Iris Dataset
- Boston House Dataset
- Digits Dataset
- Diabetes Dataset
- Breast Cancer Dataset
- Wine Recognition Dataset
Installing Scikit Library
Run the below command, and scikit library is installed.
pip install -U scikit-learn
It contains 3 classes (Setosa, Versicolor, and Virginica) with 50 instances each.
The attributes of the dataset are,
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
— Iris Setosa
— Iris Versicolour
— and Iris Virginica
Then without much delay, Let us start building our model.
Step 1: Import the Required Libraries
import numpy as np import pandas as pd
Step 2: Read the Data
iris = pd.read_csv("/content/Iris.csv") iris.head()
Step 3: Check the shape of the data
iris.shape
Step 4: Get the information of the data using info() function
iris.info()
Step 5: Get the description of the data using describe() function
iris.describe()
Step 6: Check the Distinct count of the species
iris['Species'].value_counts()
Step 7: Split the data into training and testing set
Now, Let us try to build some ML models using the scikit learn Library.
from sklearn.model_selection import train_test_split X= iris.drop(['Id', 'Species'], axis=1) y=iris['Species'] print(X.shape) print(y.shape) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
Step 8: Build a Randomforest classifier model
from sklearn.ensemble import RandomForestClassifier model=RandomForestClassifier(n_estimators=100) model.fit(X_train,y_train) y_pred=model.predict(X_test)
Step 9: Find out the Accuracy, confusion matrix and classification report for the model
from sklearn import metrics from sklearn.metrics import classification_report, confusion_matrix print("Accuracy of the Model is:",metrics.accuracy_score(y_test, y_pred)) print("Confusion matrix is"," ",confusion_matrix(y_test, y_pred)," ") print("Classification_Report is "," ",classification_report(y_test, y_pred)," ")
Step 10: Check for the important feature as well
import pandas as pd imp_feature = pd.Series(model.feature_importances_,index=['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']).sort_values(ascending=False) imp_feature
import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline # Creating a bar plot sns.barplot(x=imp_feature, y=imp_feature.index) # Add labels to your graph plt.xlabel('Important Feature Score') plt.ylabel('Features') plt.title("Important Features") plt.legend() plt.show()
We see that PetalWidthCm is the most important feature.
Learn more about Scikit: Introduction to Data Science and Scikit Learn in Python
Conclusion
The popularity of Machine Learning languages necessitates effective tools, and sklearn in Python fills that demand for both newcomers and those working on supervised learning problems. Scikit is a popular choice among academic and industrial groups for accomplishing a variety of tasks due to its efficiency and versatility.
Top Trending Tech Articles:Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions
Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.
Click here to submit its review with Shiksha Online.
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio