Heatmap in Seaborn

4 mins read9.5K Views Comment

Updated on Aug 27, 2024 17:31 IST

Introduction

In Machine Learning and Data Science, when working with data, you’re sure to perform Exploratory Data Analysis (EDA) to analyze the data before getting on with model development.

EDA helps summarize the main characteristics of your data, mostly employing data visualization methods.

Seaborn is a very popular data visualization library in Python. It is an extension to Python’s Matplotlib library and offers an easy, intuitive, yet highly customizable API for data visualization.

For this article, we are going to focus on Heatmap in Seaborn – a common technique used to observe relationships between variables in your data through color-coding. Let’s see how to perform EDA with heatmaps in Python using Seaborn.

We will be covering the following sections:

Recommended online courses

Best-suited Data Visualization courses for you

Learn Data Visualization with these high-rated online courses

GPS Surveying

NPTELCertificate

Total Fees

Free

Duration

4 weeks

Introduction to Geographic Information Systems

NPTELCertificate

5.0

Total Fees

₹1 K

Duration

4 weeks

Data Science

Seven Mentor Pvt LtdCertificate

Total Fees

– / –

Duration

110 hours

Online Post Graduate Program in Data Engineering (Visualization)

GRV Business Management AcademyCertificate

Total Fees

₹72 K

Duration

6 months

Discontinued (Aug 2024)- Effective Data Visualization for the Data-Driven Organisation (Online)

IIM AhmedabadCertificate

Total Fees

– / –

Duration

22 days

Online Post Graduate Program in Data Engineering (Visualization)

361 Degree MindsCertificate

Total Fees

₹59.8 K

Duration

1 year

Online Post Graduate Program In Data Science And Data Visualization

361 Degree Minds - Annamalai UniversityCertificate

Total Fees

₹89.4 K

Duration

12 months

Data Visualization using Tableau

MKSSS Academy of Information Technology For WomenCertificate

Total Fees

₹7 K

Duration

2 months

M.Sc. in Data Science

upGrad - Chandigarh University, MumbaiDegree

Total Fees

₹1.2 L

Duration

2 years

Online Certificate Program In Tableau and Data Visualization

Education Lanes - A Mahindra Group InitiativeCertificate

Total Fees

₹25 K

Duration

2 months

Introduction to Heatmaps

A Heat map is a graphical representation of multivariate data that is structured as a matrix of columns and rows.

Heat maps are very useful in describing correlation among several numerical variables, visualizing patterns and anomalies.

What is meant by correlation?

Correlation is a dimensionless unit that determines the degree to which variables are related.
It measures both strength and direction of the linear relationship between variables.
Its value lies between 0 and 1, depicting strength.
+ and – signs depict direction.

Correlation Matrix

A correlation matrix denotes the correlation coefficients between variables at the same time.

A heat map represents these coefficients to visualize the strength of correlation among variables. It helps find features that are best for Machine Learning model building.

The heat map transforms the correlation matrix into color coding.

The correlation matrix shows how the variables are correlated to each other on a scale of -1 to 1, with 1 being a perfect positive correlation and -1 being a perfect inverse correlation.

Now, we will understand how to create a heat map to determine the correlation between multiple variables.

The dataset used in this blog can be found here. This dataset contains information on cars such as their make, model, year, engine, and other properties.

We need to ascertain if there is a relationship between the features of this dataset.

So, let’s get started, shall we?

Installing and Importing Seaborn

First, let’s install the Seaborn library in your working environment. Execute the following command in your terminal:

pip install seaborn

Once Seaborn is installed, ensure that you also install the necessary packages and libraries that Seaborn is dependent on:

Pandas
NumPy
Matplotlib
SciPy

Now let’s import the libraries we’re going to need today:

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Creating a Heatmap using Seaborn

Load the dataset

Prior to creating our plot, let’s check out the dataset:

#Read the dataset
df = pd.read_csv('data.csv')
df.head()

#Check out the number of columns
df.shape

There are 16 columns (or features) in this dataset. Let’s print them all:

#List all column names
print(df.columns)

We need to remember that heat maps cannot visualize categorical features.

So, we are only going to focus on the numerical features of the datasets.

Plotting the data

Now, we will plot the data using Seaborn’s heatmap() function.

But before that let’s create a correlation matrix using the corr() function:

#Calculating correlation between each pair of variables
corr_matrix=df.corr()
 
#Creating a seaborn heatmap
sns.heatmap(corr_matrix)

A heat map is generated as shown above. Note that the corr() function selects the ideal (read: numerical) features for the plot.

Customizing Heatmaps

Let’s enlarge one of our graphs to view it clearly:

We’ll specify the figsize parameter in the plt.figure() function of Matplotlib to set the dimensions of the figure in inches.

plt.figure(1, figsize=(10,5))
sns.heatmap(corr_matrix)

cmap: maps data values to color space.

#Parameter - cmap
plt.figure(1, figsize=(10,5))
sns.heatmap(corr_matrix, cmap='BrBG')

As you can see, we have specified the color palette to Browns Blue Greens.

You can set different color shades or color combinations as well.

center: specifies the value at which to center the colormap when plotting divergent data.

#Parameter - center
plt.figure(1, figsize=(10,5))
sns.heatmap(corr_matrix, cmap='BrBG', center=0)

annot: when set to boolean True, displays the correlation coefficient for each matrix cell.

#Parameter - annot
plt.figure(1, figsize=(10,5))
sns.heatmap(corr_matrix, cmap='BrBG', center=0, annot=True)

cbar: set to Boolean True by default. When set to False, it removes the color bar beside the heatmap.

#Parameter - cbar
plt.figure(1, figsize=(10,5))
sns.heatmap(corr_matrix, cmap='BrBG', center=0, annot=True, cbar=False)

linewidths: specifies the width of the lines that will divide each cell.
linecolor: specifies the color of the lines that will divide each cell.

#Parameters - linewidths and linecolor
plt.figure(1, figsize=(10,5))
sns.heatmap(corr_matrix, cmap='BrBG', center=0, annot=True, cbar=False, linewidths=0.5, linecolor='red')

yticklabels and xticklabels: control the presence of labels for the Y and X-axis respectively. They are set to Boolean True by default. When set to False, it removes the labels from the heatmap.

#Parameter - xticklabels
plt.figure(1, figsize=(10,5))
sns.heatmap(corr_matrix, cmap='BrBG', center=0, annot=True, xticklabels=False)

square: when set to Boolean True, displays the heatmap in a squared form.

#Parameter - square
plt.figure(1, figsize=(10,5))
sns.heatmap(corr_matrix, cmap='BrBG', center=0, annot=True, square=True)

From our heatmap above, we can infer the following:

Features ‘city mpg’ and ‘highway MPG’ have a strong positive correlation with a value of 0.89
Features ‘Engine Cylinders’ and ‘Engine HP’ also have a strong positive correlation with a value of 0.78
Another positive correlation is between features ‘Engine Cylinders’ and ‘Engine HP’ with feature ‘MSRP’. The values being 0.53 and 0.66 respectively
Feature ‘Engine Cylinders’ has a strong negative correlation with features ‘city mpg’ and ‘highway MPG’ with values -0.6 and -0.62 respectively

Conclusion

The primary purpose of the Heatmap in Seaborn is to display a color-coded correlation matrix for easy visualization of the relationship between the features in the data.

Seaborn is easier to customize and much more functional and organized than Matplotlib for basic plots.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Heatmap in Seaborn

Introduction

Best-suited Data Visualization courses for you

GPS Surveying

Introduction to Geographic Information Systems

Data Science

Online Post Graduate Program in Data Engineering (Visualization)

Discontinued (Aug 2024)- Effective Data Visualization for the Data-Driven Organisation (Online)

Online Post Graduate Program in Data Engineering (Visualization)

Online Post Graduate Program In Data Science And Data Visualization

Data Visualization using Tableau

M.Sc. in Data Science

Online Certificate Program In Tableau and Data Visualization

Table of Content

Introduction to Heatmaps

What is meant by correlation?

Correlation Matrix

Installing and Importing Seaborn

Creating a Heatmap using Seaborn

Load the dataset

Plotting the data

Customizing Heatmaps

Conclusion

Top Picks & New Arrivals