Series vs. DataFrame in Pandas – Shiksha Online

Series vs. DataFrame in Pandas – Shiksha Online

5 mins read31.5K Views Comment
Updated on Apr 10, 2023 16:17 IST

In this tutorial, we are going to learn the two most common data structures in Pandas – Series and DataFrame. 

2022_03_Series-vs.-DataFrame-in-Pandas.jpg

Pandas is a very popular open-source Python library that offers a diverse set of tools that aid in performing data analysis more efficiently. The Pandas package is mainly used for data pre-processing purposes such as data cleaning, manipulation, and transformation. Hence, it is a very handy tool for data scientists and analysts. In this article, we cover the two most common data structures in Pandas – Series and DataFrame, and also Series vs DataFrame. 

We will cover the following sections:

Installing and Importing Pandas

First, let’s install the pandas library in your working environment. Execute the following command in your terminal:

 
pip install pandas
Copy code

Now let’s import the libraries we’re going to need today:

 
import pandas as pd
import numpy as np
Copy code
Recommended online courses

Best-suited Python for data science courses for you

Learn Python for data science with these high-rated online courses

Free
4 weeks
12 K
8 hours
4.24 K
6 weeks
40 K
100 hours
4.99 K
– / –
– / –
– / –
– / –
60 hours
– / –
90 hours
1.27 L
12 hours

Data Structures in Pandas

Data Structure refers to the specialized way of organizing, processing, and storing data to apply specific types of functionalities to them. 

Pandas has two main types of Data Structures based on their dependability –

  • Series: 1D labeled array
  • DataFrame: 2D labeled tabular structure

Series vs DataFrame

Let’s summarize the difference between the two structures in a table:

Pandas Series Pandas DataFrame
One-dimensional Two-dimensional
Homogenous – Series elements must be of the same data type. Heterogenous – DataFrame elements can have different data types.
Size-immutable – Once created, the size of a Series object cannot be changed. Size-mutable – Elements can be dropped or added in an existing DataFrame.

Now that we have a fair idea about Series and DataFrame, let’s see how we create them in Python, shall we?

Pandas Series

As stated above, Pandas Series is a one-dimensional labeled array whose object size cannot be changed. You can also see it as the primary building block for a DataFrame, making up its rows and columns. 

Following is the basic method to create a Series:

 
#pandas.Series
series = pd.Series(data=None, index=None, dtype=None, name=None)
Copy code
Index Data
0 element 1
1 element 2
2 element 3
3 element 4

The data parameter can take any of the following data types:

  • Python dictionary (dict)
  • ndarray
  • A scalar value

The index parameter accepts list data type that allows you to label your index axis.

The dtype parameter sets the data type of the Series.

The name parameter allows you to name your Series.

Creating a Pandas Series Using Dictionary

If data is of dict type and index is not specified, the dict keys will be the index labels.

 
#Creating a Series from dict
data = {'Mon': 22, 'Tues': 23, 'Wed': 23, 'Thurs': 24, 'Fri': 23, 'Sat': 22, 'Sun': 21}
series = pd.Series(data=data, name='series_from_dict')
print(series)
Copy code
Creating a Pandas Series Using Dictionary

Creating a Pandas Series Using ndarray

If data is a ndarray, the index must be of the same length as the array. If index is not specified, it will be created automatically with values: [0, …, len(data) – 1].

NumPy library has a function random.randint() that produces a ndarray populated with random integers, let’s use that here:

 
#Creating a Series from ndarray
data = np.random.randn(5)
series = pd.Series(data=data,
index=['one', 'two', 'three', 'four', 'five'],
name='series_from_ndarray')
print(series)
Copy code
Creating a Pandas Series Using ndarray

Creating a Pandas Series Using Scalar Values

The data can be assigned a single value. The index has to be provided in this case. The given value will be repeated up to the length of the index.

 
#Creating a Series from a scalar value
series = pd.Series(data=7.3,
index=['a', 'b', 'c', 'd'],
name='series_from_scalar')
print(series)
Copy code
Creating a Pandas Series Using Scalar Values

Pandas DataFrame

Pandas DataFrame, on the other hand, is a two-dimensional structure with columns and rows whose size can be changed. You can also think of it as a dictionary of Series objects. 

Following is the basic method to create a DataFrame:

 
\n \n \n <pre class="python" style="font-family:monospace">\n \n \n <span style="color: #808080;font-style: italic">\n \n \n #pandas.DataFrame\n \n \n
\n \n \n </span style="color: #808080;font-style: italic">\n \n \n </pre class="python" style="font-family:monospace">
Copy code
df = pd.DataFrame(data=None, index=None, columns=None, dtype=None)
Copy code
Index COlumn 1 COLUMN 2
0 element 1 element a
1 element 2 element b
2 element 3 element c
3 element 4 element d

The data parameter can take any of the following data types:

  • Dictionary (dict) of – 1D ndarray, lists, or Series 
  • 2D ndarray
  • Pandas Series
  • Another Pandas DataFrame

The index parameter can be passed optionally, and it accepts row labels.

The columns parameter can also be passed optionally, and it accepts column labels.

The dtype parameter sets the data type of the DataFrame.

Creating a Pandas DataFrame Using a Dictionary of Pandas Series

The index must be the same length as the Series. If index is not specified, it will be created automatically with values: [0, …, len(data) – 1].

 
#Creating a DataFrame from a dictionary of Series
data = pd.DataFrame({
"Class 1": pd.Series([22, 33, 38], index=["math avg", "science avg", "english avg"]),
"Class 2": pd.Series([45, 28, 36], index=["math avg", "science avg", "english avg"]),
"Class 3": pd.Series([32, 41, 47], index=["math avg", "science avg", "english avg"])
})
data
Copy code
Creating a Pandas DataFrame Using a Dictionary of Pandas Series

Creating a Pandas DataFrame Using a Dictionary of Lists or ndarrays

The ndarrays must all be of the same length. The index must be the same length as the arrays. If the index is not specified, the result will be range(n), where n is the array length.

Let’s create the same DataFrame, but this time using lists/ndarrays:

 
#Creating a DataFrame from a dictionary of lists
data = {
"Class 1": [22, 33, 38],
"Class 2": [45, 28, 36],
"Class 3": [32, 41, 47]
}
df = pd.DataFrame(data=data, index=['math avg', 'science avg', 'english avg'])
df
Copy code
Creating a Pandas DataFrame Using a Dictionary of Lists or ndarrays

Creating a Pandas DataFrame Using a List of Dictionaries

 
#Creating a DataFrame from a list of dictionaries
data = [{"col1": 1, "col2": 2}, {"col1": 5, "col2": 10, "col3": 20}]
pd.DataFrame(data)
Copy code
Creating a Pandas DataFrame Using a List of Dictionaries

Creating a Pandas DataFrame Using a Series

When you create a DataFrame using a Series, the resulting DataFrame will have one column whose name is the original name of the Series:

 
#Creating a DataFrame from a Series
data = pd.DataFrame({"Col1": pd.Series([22, 33, 38])})
data
Copy code
Creating a Pandas DataFrame Using a Series

After the creation of a DataFrame, you can query it and select, add, or delete columns from it, i.e., perform Data Manipulation

Pandas DataFrame can be queried in multiple ways – such as loc[] and iloc[] methods.iloc[] can be used to query using the index/position of the value and .loc[] to query using the user-defined keys.

Endnotes

Pandas is a very powerful data processing tool for Python. It offers a rich set of functions to import and process various types of file formats from multiple data sources. The Pandas library is specifically useful for data scientists working with data cleaning and analysis. Hope this article on Series vs DataFrame helped you understand the concept better.


Top Trending Articles:

Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio