Adding Columns to Pandas DataFrame
Learn how to effortlessly expand your Pandas DataFrame’s functionality by mastering the art of adding new columns. Explore strategies for inserting, transforming, and populating new columns to suit your analysis needs.
Pandas DataFrames are tabular data structures that store data similar to an Excel or CSV file – in rows and columns. The below article covers Adding Columns to Pandas DataFrame.
During analysis, you perform several operations on a DataFrame using the functions provided in Pandas. We have already learned how to append rows to a Pandas DataFrame. In this article, we will learn how to add columns to Pandas DataFrame using four methods – assign(), insert(), concat() and apply().
We are going to cover the following sections:
- Adding a column using List
- Adding a column using Pandas Series
- Adding columns using assign()
- Adding a column using insert()
- Adding a column using concat()
- Adding a column using apply()
- Adding an empty column
- Adding a column with a constant value
- Endnotes
For our purpose today, let’s create a sample DataFrame as shown below:
#Importing Pandas Library import pandas as pd #Creating a Sample DataFrame data = pd. DataFrame ( { 'id': [ 101 , 123 , 139 , 112 , 133 ] , 'age': [ 10 , 12 , 13 , 11 , 12 ] , 'gender': [ 'M' , 'F' , 'F' , 'M' , 'M' ] , 'group': [ 'first' , 'second' , 'first' , 'third' , 'third' ] , 'math_score': [ 41.5 , 43 , 38 , 47 , 29.5 ] } ) data
Our dummy dataset comprises of 5 columns – ‘id’, ‘age’, ‘gender’, ‘group’, and ‘math marks’. As you can observe, it contains both numerical and categorical variables.
Let’s see how we perform the operations to add column(s) to this dataset.
Adding a column using List
The simplest way to add a column to an existing DataFrame is to create a list and assign it to a new column.
values = [ 40 , 38 , 32.5 , 27 , 30 ] data [ 'science_score' ] = values data
Adding a column using Pandas Series
A single column is nothing but a Pandas Series – that is a 1D homogenous array.
You can simply assign the values of your Series into the existing DataFrame to add a new column:
series = pd. Series ( [ 40 , 38 , 32.5 , 27 , 30 ] , index = [ 0 , 1 , 2 , 3 , 4 ] ) data [ 'science_score' ] = series. values data
Note that if the new column indices do not match those of the DataFrame, then NaN values are assigned to those indices:
data [ 'science_score' ] = pd. Series ( [ 40 , 38 , 32.5 , 27 , 30 ] , index = [ 1 , 2 , 3 , 4 , 5 ] ) print (data )
Adding columns using assign()
You can use the assign() function to insert multiple new columns in a DataFrame when:
- Index of the new column can be ignored
- Values of an existing column need to be overwritten
This method returns a new DataFrame object, that is a copy of the DataFrame, containing all the original columns along with the new ones.
s1 = pd. Series ( [ 40.5 , 38.5 , 33 , 28 , 31 ] , index = [ 0 , 1 , 2 , 3 , 4 ] ) s2 = pd. Series ( [ 48.5 , 42 , 41 , 37 , 43 ] , index = [ 0 , 1 , 2 , 3 , 4 ] ) data. assign (science_score =s1. values , english_score =s2. values )
Adding a column using insert()
You can use the insert() function when you need to insert a new column in a specific position or index.
#Using the Series s2 created above data. insert ( len (data. columns ) , 'english_score' , s2. values ) print (data )
What if you wanted to insert the english_score before the math_score?
#Using the Series s2 created above data. insert ( 4 , 'english_score' , s2. values ) print (data )
Best-suited Python courses for you
Learn Python with these high-rated online courses
Adding duplicate columns using insert()
The allow_duplicates parameter is set to False by default and returns a ValueError if the new column has a duplicate column name.
s3 = pd. Series ( [ 43 , 34 , 33.5 , 29 , 47 ] , index = [ 0 , 1 , 2 , 3 , 4 ] ) data. insert ( 5 , 'english_score' , s3. values , allow_duplicates = True ) print (data )
As you can observe, there are two english_score columns in the above DataFrame.
Adding a column using concat()
You can concatenate a new column to an existing DataFrame by setting axis=1. The output would be a new DataFrame with the concatenated column.
s4 = pd. Series ( [ 48 , 46 , 43.5 , 49 , 47 ] , index = [ 0 , 1 , 2 , 3 , 4 ] ) data = pd. concat ( [data , s4. rename ( 'PE_score' ) ] , axis = 1 ) data
Adding a column using apply()
When performing data manipulation, you might need to add a new column based on the values in the existing column(s). For this, apply() method can be used as shown:
data [ 'avg_score' ] = data. apply ( lambda row: ( (row. math_score + row. science_score ) / 2 ) , axis = 1 ) data
As shown in the above DataFrame, we have calculated the average score based on the math_score and science_score columns using the lambda function.
Setting axis=1 ensures that apply() method works at the column level.
Adding an empty column
You can also add an empty column to the DataFrame by assigning a new column with the pd.NaT. Let’s add an empty column to our original DataFrame:
data [ 'avg_score' ] = pd. NaT data
pd.NaT denotes missing or null values in the Pandas DataFrame.
Adding a column with a constant value
You can assign a single value to all elements in a new column, as shown:
data [ 'total_score' ] = 50 data
Endnotes
When inserting new columns to your Pandas DataFrame, you must pick the most suitable method based on your requirement. Pandas is a very powerful data processing tool and provides a rich set of functions to process and manipulate data for analysis. If you seek to learn the basics and various functions of Pandas, you can explore related articles here.
———————————————————————————————————————
Top Trending Tech Articles:Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions
Contributed by – Prerna Singh
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio