Working with Date and Time in Pandas
Using the correct data type is crucial for accurate analysis. While Pandas can usually infer data types accurately, there are scenarios where explicit data conversion is necessary. This article will demonstrate how Pandas manages dates and times using the DateTime data type.
Best-suited Python courses for you
Learn Python with these high-rated online courses
Date and time features play a critical role in various data science fields, such as marketing, sales, e-commerce, and retail.
Must Check: Pandas Online Courses and Certifications
Must Check: Pandas Interview Questions for Data Scientists
Today, we will cover the following sections:
- DateTime Objects in Pandas
- Timedelta Objects in Pandas
- DateTime Format
- Handling Parsing Errors
- Extract DateTime into Different Columns
- Assemble DateTime from Different Columns
- Date Range in Pandas
- Endnotes
DateTime Objects in Pandas
Convert string to DateTime
Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object.
To understand this through example, let’s create a sample DataFrame as shown below:
#Importing required librariesimport pandas as pdimport numpy as npfrom datetime import datetime #Creating a sample DataFramedf = pd.DataFrame({'date': ['04/03/2022', '05/03/2022', '06/03/2022'], 'patients': [16, 19, 11]}) df
df.dtypes
Let’s check the data types of the DataFrame:
As you can see, the ‘date’ column in the DataFrame is currently a string-type object.
Let’s parse this to datetime, as shown:
df.dtypesdf['date'] = pd.to_datetime(df['date'])df
print(type(df['date']))
Let’s check the converted data type:
df.dtypes
Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
Convert integer to DateTime
The to_datetime()function will convert integer format to a DateTime object. It takes two mandatory arguments:
- unit: defines if your integer is in terms of the number of days, seconds, years, etc.
- origin: defines the reference point you want to start counting your ‘units’ from
For example,
pd.to_datetime(14667, unit='D', origin='unix')
Here, the unit parameter is set to the number of days, and the origin parameter is set to start at the Unix, that is, 1970-01-01. So, the timestamp displays the date and time 14667 days from the Unix.
Let’s take another example,
pd.to_datetime(996209999, unit='s', origin='2000-01-01')
The timestamp displays the date and time 996209999 seconds from 2000-01-01. Phew, that’s a lot of seconds!
Timedelta Objects in Pandas
In case you need to find the duration between two dates, the result is returned as a timedelta object. Pandas has a built-in function called to_timedelta()that gives the difference in dates and times.
Let’s understand this through example:
#Timedelta - duration between datesdate = datetime.now() #Present Dateprint(date) #Date Tomorrowprint(date+pd.to_timedelta(1, unit='D')) #Date After a Monthprint(date+pd.to_timedelta(1, unit='m'))
Here, too, the unit argument defines whether that’s days, months, hours, seconds, etc.
DateTime Format
By default, the to_datetime() function parses string with month-first format (MM-DD-YYYY). You can check your datetime format using the argument format, as shown:
pd.to_datetime(df['date'], format="Your_datetime_format")
However, if you’re outside of the United States, you probably write the day-first (DD-MM-YYYY).
You can make Pandas display day first instead of month by setting the argument dayfirst=True, as shown:
df = pd.DataFrame({'date': ['4/03/2022', '5/03/2022', '6/03/2022'], 'patients': [16, 19, 11]}) df['date'] = pd.to_datetime(df['date'], dayfirst=True)df
Custom format
The format argument of the to_datetime() function allows you to pass a custom format. For example, let’s say you want to parse your string with the following format – YYYY-MM-DD HH: MM: SS. Let’s see how we can do this:
#Creating another Sample DataFramedf0 = pd.DataFrame({'time_of_death': ['2020-7-13 21:15:0', '2020-7-9 17:30:30', '2020-12-10 4:45:12'], 'patientID': [101, 23, 48]}) df0['time_of_death'] = pd.to_datetime(df0['time_of_death'], format="%Y-%m-%d %H:%M:%S")df0
Handling Parsing Errors
If your date string does not meet the timestamp format, you will get a TypeError, as shown:
df = pd.DataFrame({'date': ['4/03/2022', 'x/03/2022', '6/03/2022'], 'patients': [16, 19, 11]}) df['date'] = pd.to_datetime(df['date'], dayfirst=True)df
You can set the argument errors to ‘ignore’ or ‘coerce’ to avoid this.
df['date'] = pd.to_datetime(df['date'], dayfirst=True, errors='ignore')df
df['date'] = pd.to_datetime(df['date'], dayfirst=True, errors='coerce')df
Extract DateTime into Different Columns
Pandas DateTime object has the following in-built attributes:
- dt.year – to get the year
- dt.month – to get the month
- dt.day – to get the day
- dt.week – to get the week of the year
- dt.dayofweek – to get the day of the week
- dt.is_leap_year – to get leap year
Let’s see how we can get these separately:
df = pd.DataFrame({'date': ['4/03/2022', '5/03/2022', '6/03/2022'], 'patients': [16, 19, 11]}) df['date'] = pd.to_datetime(df['date'], dayfirst=True) df['year']= df['date'].dt.yeardf['month']= df['date'].dt.monthdf['day']= df['date'].dt.daydf
df['week_of_year'] = df['date'].dt.weekdf['day_of_week'] = df['date'].dt.dayofweekdf['is_leap_year'] = df['date'].dt.is_leap_yeardf
Assemble DateTime from Different Columns
You can use the to_datetime()function to assemble a datetime from multiple columns as well. The keys (columns labels) can be common abbreviations:
#Creating another Sample DataFramedf1 = pd.DataFrame({'year': [2000, 2010, 2020], 'month': [1, 2, 3], 'day': [21, 4, 13]})df1['date'] = pd.to_datetime(df1)df1
Date Range in Pandas
There will be instances where you would need to create date sequences. To make this convenient, Pandas provides the date_range() method that accepts a start date, an end date, and an optional frequency code (default code is ‘D’) :
pd.date_range(start='13/4/2022', end='5/5/2022')
You could also use the datetime.today() function to specify today’s date and time. And define the time period you want to generate:
pd.date_range(start=datetime.today(), periods=10, freq='T')
Endnotes
Dates and times contain a wealth of valuable information for data analysis. This article thoroughly explores the built-in functions provided by Pandas for effortless DateTime conversions. Pandas, as a powerful data processing tool, empowers users with a rich set of functions for processing and manipulating data to facilitate thorough analysis.
FAQs on Pandas Date and Time Function
What is the Pandas DateTime format?
The Pandas DateTime format is a standardized way of representing date and time, which includes components like year, month, day, hour, minute, and second. You can convert strings to DateTime objects using pd.to_datetime() for effective manipulation.
How do I convert a string to a DateTime object in Pandas?
You can convert a string to a DateTime object using the pd.to_datetime() function. For example:
import pandas as pd
timestamp = pd.to_datetime('2023-07-30 12:00:00')
How can I extract the year, month, and day from a DateTime object?
You can extract these components using the .dt accessor
How do I filter a DataFrame based on a specific date?
You can filter a DataFrame by creating a boolean mask. For example:
How can I create a range of dates in Pandas?
Use pd.to_range() to create a range of dates:
How do I resample time series data in Pandas?
You can resample data using the resample() method. For example, to get monthly averages:
How can I handle time zones in Pandas?
You can localize and convert time zones using the dt.tz_localize() and dt.tz_convert() method.
How do I find the difference between two dates?
Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio