Data Visualization using Matplotlib – A Beginner’s Guide
Data Visualization is a way of summarizing data visually. Huge amounts of data are being collected all around you at all times of the day – whether it’s through surveys or social media tracking or even the transactions you’re making. The data provides useful insights for businesses and visualizations make it easier to identify trends and patterns in text-based data.
“Humans are visual creatures. Half of the human brain is directly or indirectly devoted to processing visual information.”
Visualizations are the easiest way to analyze and intake information. Data Visualization also gives way to high-level data analysis in Exploratory Data Analysis (EDA) and Machine Learning (ML).
In this blog on Data Visualization using Matplotlib, we will be covering the following sections:
- Introduction to Matplotlib
- Installing and Importing Matplotlib
- The Relation between – Matplotlib, PyPlot, Python
- Creating a Simple Plot
- Adding Elements in a Plot
- Making Multiple Plots in One Figure
- Creating Subplots
- Figure Objects
- Axes Objects
- Different Types of Plots
- Saving Plots
Introduction to Matplotlib
Matplotlib library is used to create static 2D plots, although it does have some support for 3D visualizations. It makes producing both simple and advanced plots straightforward and intuitive. It can be used in Python scripts, Jupyter notebook, and web application servers.
Let’s understand how to use Matplotlib practically with a fun example. The dataset used in this blog can be found here. It contains information on the global happiness survey in the year 2021. This data describes how measurements of well-being can be used effectively to assess the progress of nations.
Best-suited Python for data science courses for you
Learn Python for data science with these high-rated online courses
Installing and Importing Matplotlib
Let’s start with installing the library in your working environment first. Execute the following command in your terminal:
pip install matplotlib
Now let’s import the libraries we’re going to need today:
import pandas as pdimport matplotlib.pyplot as plt%matplotlib inline
In Matplotlib, pyplot is used to create figures and change their characteristics.
The %matplotlib inline function allows for plots to be visible when using Jupyter Notebook.
The Relation between – Matplotlib, PyPlot, and Python
- Python is the programming language used popularly for mathematical and statistical analysis. It works on most of the platforms and has multiple libraries for data manipulation, transformation, and visualization.
- In Python, data visualizations can be done via many libraries. The most popular and widely used of them all, and the one we’re discussing in this blog, is the Matplotlib library. In fact, many of the other libraries utilize attributes of Matplotlib to display the plots they generate.
- PyPlot is a module in Matplotlib which provides a MATLAB-like interface. MATLAB is licensed software, whereas PyPlot is an open-source module that provides similar functionality.
Creating a Simple Plot
Load the dataset
Prior to creating our graphs, let’s check out the dataset
#Read the datasetdf = pd.read_csv('world-happiness-report-2021.csv')df.head()
#List all column namesfor col_name in df.columns: print(col_name)
Here, Ladder score is basically the happiness score, explained by six factors.
Dystopia is a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors.
Now, let’s move ahead with analyzing this dataset through Data Visualization using Matplotlib.
Line Graphs/Plots
These are basically the simplest graphs you can create using Matplotlib. Let’s create one to analyze the relationship between a country’s happiness score and life expectancy.
Here, the data points will plotted y (‘Healthy life expectancy’) versus x (‘Ladder score’) using plot() function:
#Create Seriesexpectancy = df['Healthy life expectancy']score = df['Ladder score'] plt.plot(score, expectancy)
A basic line plot is generated as shown above.
Adding Elements in a Plot
The plot we have created would not be easily understandable to a third pair of eyes without context, so let’s try to add different elements to interpret it better:
- Use plt.title() for setting a plot title
- Use plt.xlabel() and plt.ylabel() for labeling x and y-axis respectively
- Use plt.legend() for the observation variables
- Use plt.show() for displaying the plot
plt.plot(score, expectancy) plt.title('Happiness Plot')plt.xlabel('Happiness Score')plt.ylabel('Age') plt.legend(['Healthy Life Expectancy'])plt.show()
Can you see how a labeled graph vastly improves its readability? One can easily point out how life expectancy is overall increasing with a higher happiness score.
There are many more elements we can experiment with when creating graphs:
#Add color, style, width to line elementplt.plot(score, expectancy, color = 'green', linestyle = '--', linewidth=1.2)
#Add color, style, width to line elementplt.plot(score, expectancy, color = 'green', linestyle = '--', linewidth=1.2)
#Add grid using grid() methodplt.grid(True) plt.plot(score, expectancy)
The plots can be customized based on the following attributes:
Making Multiple Plots in One Figure
The plots can be customized based on the following attributes:
Let’s compare the GDP and life expectancy of countries against their happiness score. For comparison, we’ll need to plot ‘happiness score vs GDP’ and ‘happiness score vs life expectancy’ in a single figure. Let’s see how to do that:
#Create Series for GDPgdp = df['Logged GDP per capita'] plt.plot(score, expectancy)plt.plot(score, gdp) plt.title('Happiness Score vs GDP and Life Expectancy')plt.xlabel('Happiness Score') plt.legend(['Life Expectancy','GDP'])plt.show()
From this graph, we can also visually identify a trend – both GDP per capita and life expectancy have higher values than for countries with higher happiness scores.
If you want to display the plots in separate figures, use plt.show() after each plot statement as shown below:
plt.plot(score, expectancy)plt.title('Happiness Score vs Life Expectancy')plt.xlabel('Happiness Score') plt.show() plt.plot(score, gdp, color ='orange')plt.title('Happiness Score vs GDP')plt.xlabel('Happiness Score') plt.show()
Through these separate graphs, we can see that when there is a spike/dip for GDP per capita for a given score, there is also a spike/dip for life expectancy for the same score.
Creating Subplots
We use pyplot.subplots to create a figure and a grid of subplots with a single call. For example, for the previous scenario, we could create subplots using the following lines of codes:
#Creating two subplotsfig, axs = plt.subplots(2)`fig.suptitle('Vertically stacked subplots')axs[0].plot(score, expectancy)axs[1].plot(score, gdp, color = 'orange')
Figure Objects
The matplotlib.figure is a module in Matplotlib that provides the figure object, which contains all the plot elements. This module controls the default spacing of the subplots. matplotlib.figure.Figure() class is the top-level container for the plot elements. It returns the figure instances.
plt.figure() is used to create the empty figure object in Matplotlib. It has the following additional parameters:
- figsize: Figure dimension (width, height) in inches
- dpi: Dots per inch
- facecolor: Figure patch facecolor
- edgecolor: Figure patch edge color
- linewidth: Linewidth of the frame
Let’s create a figure object:
#Creating a figure object figfig=plt.figure(figsize=(10,4), facecolor ='green', edgecolor='r',linewidth=5) plt.plot(score, expectancy)plt.show()
Axes Objects
The matplotlib.axes is a module that contains most of the figure elements: Axis, Tick, Line2D, Text, Polygon, etc., and sets the coordinate system.
The matplotlib.axes.Axes() class supports callbacks through func(ax) where ax is the axes instance.
- Use add_axes() to add axes to the figure
- Use ax.set_title() for setting title
- Use ax.set_xlabel() and ax.set_ylabel() for setting x and y-label respectively
Let’s see how to add axes to our figure:
#Creating a figure object figfig = plt.figure() #Adding the axesax = fig.add_axes([0,0,2,1]) plt.plot(score, expectancy)plt.plot(score, gdp) ax.legend(labels = ('Healthy life expectancy', 'GDP per capita'), loc = 'upper left') ax.set_title("Usage of add_axes function")ax.set_xlabel('x-axis')ax.set_ylabel('y-axis')plt.show()
Different Types of Plots
Matplotlib provides a wide variety of plot formats to support various methods of visualizations. Let’s go through a few most popular methods of Data Visualization using Matplotlib:
Bar Graphs/Plots
These graphs represent data through bars. bar() function is used to plot a bar graph. Can be plotted vertically (default) or horizontally (using barh() function).
Typically used with categorical variables. However, they can also be used with numerical variables, as we’ll see in our case. These graphs work with discrete values, so we’ll convert our Ladder score to integer type.
#Converting to intHappinessScore = score.apply(int) #Counting the number of times each score occurs – the height of the barscount = HappinessScore.value_counts() #Score of each count – X-axisHapScore = count.index #Plotting the bar graphplt.bar(HapScore, count)plt.title('Happiness Score')plt.xlabel('Score')plt.ylabel('Count')plt.show()
Histograms
Similar to bar graphs, Histograms display the frequency/count values in discrete intervals called bins. Frequency count is kept on y-axis whereas intervals on x-axis. hist() function is used to plot histograms, as shown below:
plt.hist(score)plt.title('Happiness Score Distribution')plt.xlabel('Happiness Score')plt.ylabel('Frequency')plt.show()
Technically, the happiness score takes a continuous range of values, so we can get a general idea of the score distribution through the above histogram. Though we can’t get exact data figures just by looking at them.
Boxplots
A boxplot is used to display data distribution through quartiles. Quartiles (Q1, Q2, Q3, Q4) are basically the division of data into four equal groups or intervals. A median separates the lower half and upper half of the data.
The function used for the scatter plot is boxplot(). Used to detect outliers in data and how tightly the data is grouped.
Let’s see if our Ladder score has any outlier data:
plt.boxplot(df['Ladder score'])plt.show()
Hardly, there’s just one little circle outside the Minimum.
Scatter Plots
Scatter plots often reveal relationships or associations between two numerical variables. The function used for the scatter plot is scatter(), as shown below:
plt.scatter(gdp, score)plt.title('GDP vs Happiness Score')plt.xlabel('GDP per Capita')plt.ylabel('Happiness Score')plt.show()
As expected, the higher the score for GDP per Capita, the higher is the happiness score of a certain country.
Saving Plots
Let’s try saving the scatter plot we have created above:
fig = plt.figure()plt.scatter(gdp, score)plt.title('GDP vs Happiness Score')plt.xlabel('GDP per Capita')plt.ylabel('Happiness Score')
fig.savefig('scatterplot.png')
The image would have been saved with the filename ‘saveimage.png’.
To view the saved image, we’ll use the matplotlib.image module, as shown below:
#Displaying the saved imageimport matplotlib.image as mpimg image = mpimg.imread("scatterplot.png")plt.imshow(image)plt.show()
Data Visualization using Matplotlib – Demo – Try it Yourself
Click the below colab icon to run the above explained demo
Endnotes
Matplotlib is one of the oldest Python data visualization libraries, and thanks to its wealth of features and ease of use it is still one of the most widely used ones. Matplotlib was first released back in 2003 and has been continuously updated since. Hope this article helped to understand the concepts of Data Visualization using Matplotlib.
Top Trending Articles:
Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio