Working with Date and Time in Pandas

Working with Date and Time in Pandas

5 mins read32.1K Views Comment
Vikram
Vikram Singh
Assistant Manager - Content
Updated on Oct 14, 2024 09:53 IST
Learn how to work effectively with dates and times in Pandas to improve your data analysis skills. This guide covers essential techniques for managing time-series data, manipulating timestamps, and gaining insights from temporal data.
Whether you're a data scientist or analyst, you'll discover how to use datetime functionality to transform your time-based datasets with ease. Start exploring and become proficient at handling dates!
2022_03_Untitled-design-5.jpg

Using the correct data type is crucial for accurate analysis. While Pandas can usually infer data types accurately, there are scenarios where explicit data conversion is necessary. This article will demonstrate how Pandas manages dates and times using the DateTime data type.

Recommended online courses

Best-suited Python courses for you

Learn Python with these high-rated online courses

Free
6 weeks
– / –
2 weeks
– / –
16 weeks
1.7 K
3 months
– / –
– / –
4.24 K
2 weeks
3 K
3 weeks
– / –
4 months

Date and time features play a critical role in various data science fields, such as marketing, sales, e-commerce, and retail.

Must Check: Pandas Online Courses and Certifications

Must Check: Pandas Interview Questions for Data Scientists

Today, we will cover the following sections:

DateTime Objects in Pandas

Convert string to DateTime

Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. 

To understand this through example, let’s create a sample DataFrame as shown below:


 
#Importing required libraries
import pandas as pd
import numpy as np
from datetime import datetime
#Creating a sample DataFrame
df = pd.DataFrame({'date': ['04/03/2022', '05/03/2022', '06/03/2022'],
'patients': [16, 19, 11]})
df
Copy code
df.dtypes
Copy code

Let’s check the data types of the DataFrame:

output2

Series vs. DataFrame in Pandas – Shiksha Online
Series vs. DataFrame in Pandas – Shiksha Online
In this tutorial, we are going to learn the two most common data structures in Pandas – Series and DataFrame. 

Pandas Interview Questions for Data Scientists

As you can see, the ‘date’ column in the DataFrame is currently a string-type object. 

Let’s parse this to datetime, as shown:


 
df.dtypesdf['date'] = pd.to_datetime(df['date'])
df
Copy code
output3

 
print(type(df['date']))
Copy code
output4

Let’s check the converted data type:


 
df.dtypes
Copy code
output5

Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.

Programming Online Courses and Certification Python Online Courses and Certifications
Data Science Online Courses and Certifications Machine Learning Online Courses and Certifications

Sorting Data using Pandas
Sorting Data using Pandas
In this article, we will discuss how to do sorting in Pandas. Sorting makes it easier to comprehend and analyze data.

Data Cleaning Using Pandas
Data Cleaning Using Pandas
Data preparation involves data collection and data cleaning. When working with multiple sources of data, there are instances where the collected data could be incorrect, mislabeled, or even duplicated. This...read more

Convert integer to DateTime

The to_datetime()function will convert integer format to a DateTime object. It takes two mandatory arguments:

  • unit: defines if your integer is in terms of the number of days, seconds, years, etc. 
  • origin: defines the reference point you want to start counting your ‘units’ from

For example, 


 
pd.to_datetime(14667, unit='D', origin='unix')
Copy code
output6

Here, the unit parameter is set to the number of days, and the origin parameter is set to start at the Unix, that is, 1970-01-01. So, the timestamp displays the date and time 14667 days from the Unix.

Let’s take another example,


 
pd.to_datetime(996209999, unit='s', origin='2000-01-01')
Copy code
output7

The timestamp displays the date and time 996209999 seconds from 2000-01-01. Phew, that’s a lot of seconds!

Timedelta Objects in Pandas

In case you need to find the duration between two dates, the result is returned as a timedelta object. Pandas has a built-in function called to_timedelta()that gives the difference in dates and times. 

Let’s understand this through example:


 
#Timedelta - duration between dates
date = datetime.now()
#Present Date
print(date)
#Date Tomorrow
print(date+pd.to_timedelta(1, unit='D'))
#Date After a Month
print(date+pd.to_timedelta(1, unit='m'))
Copy code

Text

Description automatically generated

Here, too, the unit argument defines whether that’s days, months, hours, seconds, etc. 

DateTime Format

By default, the to_datetime() function parses string with month-first format (MM-DD-YYYY). You can check your datetime format using the argument format, as shown:


 
pd.to_datetime(df['date'], format="Your_datetime_format")
Copy code
A picture containing text

Description automatically generated

However, if you’re outside of the United States, you probably write the day-first (DD-MM-YYYY).

You can make Pandas display day first instead of month by setting the argument dayfirst=True, as shown:


 
df = pd.DataFrame({'date': ['4/03/2022', '5/03/2022', '6/03/2022'],
'patients': [16, 19, 11]})
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df
Copy code
output10

Custom format

The format argument of the to_datetime() function allows you to pass a custom format. For example, let’s say you want to parse your string with the following format – YYYY-MM-DD HH: MM: SS. Let’s see how we can do this:


 
#Creating another Sample DataFrame
df0 = pd.DataFrame({'time_of_death': ['2020-7-13 21:15:0',
'2020-7-9 17:30:30',
'2020-12-10 4:45:12'],
'patientID': [101, 23, 48]})
df0['time_of_death'] = pd.to_datetime(df0['time_of_death'],
format="%Y-%m-%d %H:%M:%S")
df0
Copy code
output11

Handling Parsing Errors

If your date string does not meet the timestamp format, you will get a TypeError, as shown:


 
df = pd.DataFrame({'date': ['4/03/2022', 'x/03/2022', '6/03/2022'],
'patients': [16, 19, 11]})
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df
Copy code
2022_03_image-187.jpg

You can set the argument errors to ‘ignore’ or ‘coerce’ to avoid this.


 
df['date'] = pd.to_datetime(df['date'], dayfirst=True, errors='ignore')
df
Copy code
output13
/

 
df['date'] = pd.to_datetime(df['date'], dayfirst=True, errors='coerce')
df
Copy code
output14

Extract DateTime into Different Columns

Pandas DateTime object has the following in-built attributes:

  • dt.year – to get the year
  • dt.month – to get the month
  • dt.day – to get the day
  • dt.week – to get the week of the year
  • dt.dayofweek – to get the day of the week
  • dt.is_leap_year – to get leap year

Let’s see how we can get these separately:


 
df = pd.DataFrame({'date': ['4/03/2022', '5/03/2022', '6/03/2022'],
'patients': [16, 19, 11]})
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df['year']= df['date'].dt.year
df['month']= df['date'].dt.month
df['day']= df['date'].dt.day
df
Copy code
2022_03_image-188.jpg

 
df['week_of_year'] = df['date'].dt.week
df['day_of_week'] = df['date'].dt.dayofweek
df['is_leap_year'] = df['date'].dt.is_leap_year
df
Copy code
output16

 

Assemble DateTime from Different Columns

You can use the to_datetime()function to assemble a datetime from multiple columns as well. The keys (columns labels) can be common abbreviations: 


 
#Creating another Sample DataFrame
df1 = pd.DataFrame({'year': [2000, 2010, 2020],
'month': [1, 2, 3],
'day': [21, 4, 13]})
df1['date'] = pd.to_datetime(df1)
df1
Copy code
output17

Date Range in Pandas 

There will be instances where you would need to create date sequences. To make this convenient, Pandas provides the date_range() method that accepts a start date, an end date, and an optional frequency code (default code is ‘D’) :


 
pd.date_range(start='13/4/2022', end='5/5/2022')
Copy code
output18

You could also use the datetime.today() function to specify today’s date and time. And define the time period you want to generate:


 
pd.date_range(start=datetime.today(), periods=10, freq='T')
Copy code
output19

 

Endnotes

Dates and times contain a wealth of valuable information for data analysis. This article thoroughly explores the built-in functions provided by Pandas for effortless DateTime conversions. Pandas, as a powerful data processing tool, empowers users with a rich set of functions for processing and manipulating data to facilitate thorough analysis.

FAQs on Pandas Date and Time Function

What is the Pandas DateTime format?

The Pandas DateTime format is a standardized way of representing date and time, which includes components like year, month, day, hour, minute, and second. You can convert strings to DateTime objects using pd.to_datetime() for effective manipulation.

How do I convert a string to a DateTime object in Pandas?

You can convert a string to a DateTime object using the pd.to_datetime() function. For example:
import pandas as pd
timestamp = pd.to_datetime('2023-07-30 12:00:00')

How can I extract the year, month, and day from a DateTime object?

You can extract these components using the .dt accessor

df['year'] = df['timestamp'].dt.year
df['month'] = df['timestamp'].dt.month
df['day'] = df['timestamp'].dt.day

How do I filter a DataFrame based on a specific date?

You can filter a DataFrame by creating a boolean mask. For example:

filtered_df = df[df['timestamp'] >= '2023-07-01']

How can I create a range of dates in Pandas?

Use pd.to_range() to create a range of dates:

date_range = pd.date_range(start='2023-01-01', end='2023-01-10', freq='D')

How do I resample time series data in Pandas?

You can resample data using the resample() method. For example, to get monthly averages:

monthly_data = df.resample('M').mean()

How can I handle time zones in Pandas?

You can localize and convert time zones using the dt.tz_localize() and dt.tz_convert() method.

df['timestamp'] = df['timestamp'].dt.tz_localize('UTC').dt.tz_convert('America/New_York')

How do I find the difference between two dates?

You can subtract one DateTime column from another to get the difference:
df['duration'] = df['checkout_date'] - df['checkin_date']
About the Author
author-image
Vikram Singh
Assistant Manager - Content

Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio