Introduction to Seaborn Scatter Plot

Introduction to Seaborn Scatter Plot

4 mins read1K Views Comment
Updated on Oct 3, 2023 12:21 IST

When you are analyzing your data like a data scientist, you’re sure to perform Exploratory Data Analysis (EDA) on your data during the initial stages of data processing. EDA helps in understanding, analyzing, and condensing the main characteristics of your data through popular data visualization methods.

2022_05_Seaborn-Scatter-Plot.jpg

One such popular visualization method is the  Seaborn  library in Python. It is an extension to Python’s Matplotlib library and integrates very well with the Pandas data structure. It also offers an easy and intuitive data visualization. We are back with a tutorial on Scatter Plot using Seaborn. We have already learned Matplotlib Scatter Plot

In this blog, we will be covering the following sections: 

Introduction to Scatter Plot in Seaborn 

A scatter plot is used to obtain the correlational relationship between two numerical variables. The positioning of the data points, called markers, allows us to infer if there is a correlation between the variables. It also depicts the direction and the strength of this correlation.  

Recommended online courses

Best-suited Python courses for you

Learn Python with these high-rated online courses

– / –
40 hours
– / –
5 days
– / –
3 days
3 K
3 weeks
– / –
4 days
– / –
20 hours
– / –
2 months
Free
6 weeks

Installing and Importing Seaborn 

Firstly, we will install the Seaborn library in our working environment through the following command: 

pip install seaborn 

Now that we have installed Seaborn, let’s install the necessary packages and libraries that Seaborn is dependent on: 

  • NumPy 
  • Pandas 
  • Matplotlib 
  • SciPy 

Let’s import all the libraries we’re going to work with today: 

import seaborn as sns 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 

Get the Seaborn Datasets 

The Seaborn library offers a list of datasets that can be used for plotting data using its powerful plotting functions. The following command will list them all down: 

#Datasets available in seaborn 
sns.get_dataset_names() 
2022_05_image-224.jpg

Today, we are going to pick the ‘car_crashes’ dataset to understand how to visualize data using the Seaborn scatter plot. We are going to analyze the relationship between two numerical features.  

Creating a Seaborn Scatter Plot 

Loading the data 

#Load the data 
df = sns.load_dataset('car_crashes') 
df.head()
2022_05_image-243.jpg
#Get column names and data types 
df.info() 
2022_05_image-203.jpg

Except for the ‘abbrev’ column, the rest of the 7 columns are numerical features. Now, we are going to analyze this dataset through a scatter plot. 

Plotting the data 

Now, we’ll use the sns.scatterplot() function to plot a relationship between the total car crashes and the influence of alcohol: 

#Create Scatterplot 
sns.scatterplot(x='total',y='alcohol',data=df) 
2022_05_image-225.jpg

The above plot demonstrates that the columns ‘total’ and ‘alcohol’ are positively correlated, meaning the drivers with more alcohol in their systems tend to get into more car crashes. incur higher insurance costs. This pattern does make sense, right? Never drink and drive kids! 

Let’s label our plot for better readability.  

Adding Elements to the Scatter Plot 

As discussed above, because Seaborn is based on Matplotlib, different pyplot elements can be added to the plot to help ease its interpretation: 

  • plt.title(): Used to specify a plot title 
  • plt.xlabel() and plt.ylabel(): Used to label x and y-axis respectively 
  • plt.show(): Used to display the plot 
x = df['total'] 
y = df['alcohol'] 
 
#Add elements to scatterplot 
plt.scatter(x, y) 
plt.ylabel('Alcohol') 
plt.xlabel('Total crashes') 
plt.title('Total crashes vs Alcohol influence') 
plt.show() 
2022_05_image-228.jpg

Common Attributes of the Scatter Plot 

Hue parameter 

  • You can use this attribute to group the data points with different colors. Let’s see how:  
#Add hue 
sns.scatterplot(x='total',y='alcohol',hue='speeding', data=df) 
2022_05_image-205.jpg

Can you observe how higher crashes and higher alcohol content are correlated to a higher speed? This is obvious as rash driving, drunk or not, often leads to accidents as well. 

Size parameter 

  • You can use the size parameter to group the data points with different sizes. Let’s see how:  
#Specify size parameter 
sns.scatterplot(x='total',y='alcohol',hue='speeding', size='no_previous',  data=df) 
2022_05_image-232.jpg

One important point you should take note of is that the scatterplot() will produce the same result as relplot(kind=’scatter’) as it is the default plot in the relplot() function. 

So, we can also plot the above graph as – 

#relplot() function 
sns.relplot(x='total',y='alcohol',hue='speeding', size='no_previous', data=df
2022_05_image-207.jpg

The lightest-colored data points seem to be fading in the white background. Let’s do something about it. 

Background theme and style  

  • The set_theme() function sets the background theme for the plot. It can be set to one of the following options: 

(‘white’,’dark’,’whitegrid’,’darkgrid’,’ticks’) 

#Function - set_theme() 
sns.set_theme(style="darkgrid") 
sns.relplot(x='total',y='alcohol',hue='speeding', size='no_previous', data=df) 
2022_05_image-208.jpg

Marker parameter 

  • The default marker is a blue circle, as you have already seen in the above output. But the maskers are customizable. This is done by using the marker parameter to set marker types and style, as shown below: 

(‘v’, ’^’, ’>’, '<‘, ’o’, ’8′, ’s’, ’p’, ’*’, ’+’, ’h’, ’H’, ’D’, ’d’, ’P’, ’X’) 

#Change marker style 
sns.relplot(x='total',y='alcohol',hue='speeding', size='no_previous',  marker='v', data=df) 
2022_05_image-234.jpg

Plot colors and palette parameter  

  • The color_palette() function displays the colors currently present within Seaborn using the palplot() function: 
#Function – color_palette() 
snscolors = sns.color_palette() 
sns.palplot(snscolors) 
2022_05_image-274.jpg
  • The palette parameter is used to set colors to use for the different levels of the ’hue’ variable. It should be something that can be interpreted by color_palette()
#The palette parameter 
sns.relplot(x='total',y='alcohol',hue='speeding', palette='pastel', data=df)
2022_05_c453665d-449b-4f41-a2d4-a20530c1b13f.jpg

Alpha parameter 

  • You can use this attribute to set the transparency of markers between 0 and 1: 
#The alpha parameter 
sns.relplot(x='total',y='alcohol',hue='speeding', alpha=0.3, data=df) 
2022_05_image-273.jpg

Endnotes 

A Scatter Plot is a very popular graph that you can use during data analysis. The Seaborn library is one of the most common libraries to create scatter plots. In fact, it is easier to customize and provides better functionality and organization capabilities than Matplotlib for basic plots. If you found this article informative, kindly leave a comment below.  

Interested in learning more about Python and Data Visualization? Explore related articles here. 

Top Trending Tech Articles:
Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions

Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio