Adding Columns to Pandas DataFrame

Adding Columns to Pandas DataFrame

5 mins read27.3K Views Comment
Updated on Oct 3, 2023 11:55 IST

Learn how to effortlessly expand your Pandas DataFrame’s functionality by mastering the art of adding new columns. Explore strategies for inserting, transforming, and populating new columns to suit your analysis needs.

2022_03_Feauture-images-naukriE2.jpg

Pandas DataFrames are tabular data structures that store data similar to an Excel or CSV file – in rows and columns. The below article covers Adding Columns to Pandas DataFrame.

During analysis, you perform several operations on a DataFrame using the functions provided in Pandas. We have already learned how to append rows to a Pandas DataFrame. In this article, we will learn how to add columns to Pandas DataFrame using four methods – assign(), insert(), concat() and apply().

We are going to cover the following sections:

For our purpose today, let’s create a sample DataFrame as shown below:

 
 
 
 
   
  1. #Importing Pandas Library
  2. import pandas as pd
  3.  
  4. #Creating a Sample DataFrame
  5. data = pd. DataFrame ( {
  6. 'id': [ 101 , 123 , 139 , 112 , 133 ] ,
  7. 'age': [ 10 , 12 , 13 , 11 , 12 ] ,
  8. 'gender': [ 'M' , 'F' , 'F' , 'M' , 'M' ] ,
  9. 'group': [ 'first' , 'second' , 'first' , 'third' , 'third' ] ,
  10. 'math_score': [ 41.5 , 43 , 38 , 47 , 29.5 ]
  11. } )
  12.  
  13. data
2022_03_image-160.jpg

 

Our dummy dataset comprises of 5 columns – ‘id’, ‘age’, ‘gender’, ‘group’, and ‘math marks’. As you can observe, it contains both numerical and categorical variables.

Let’s see how we perform the operations to add column(s) to this dataset.

Adding a column using List

The simplest way to add a column to an existing DataFrame is to create a list and assign it to a new column.

 
 
 
 
   
  1. values = [ 40 , 38 , 32.5 , 27 , 30 ]
  2.  
  3. data [ 'science_score' ] = values
  4. data
2022_03_image-161.jpg

Adding a column using Pandas Series

A single column is nothing but a Pandas Series – that is a 1D homogenous array.

You can simply assign the values of your Series into the existing DataFrame to add a new column:

 
 
 
 
   
  1. series = pd. Series ( [ 40 , 38 , 32.5 , 27 , 30 ] , index = [ 0 , 1 , 2 , 3 , 4 ] )
  2.  
  3. data [ 'science_score' ] = series. values
  4. data
2022_03_image-162.jpg

Note that if the new column indices do not match those of the DataFrame, then NaN values are assigned to those indices:

 
 
 
 
   
  1. data [ 'science_score' ] = pd. Series ( [ 40 , 38 , 32.5 , 27 , 30 ] ,
  2. index = [ 1 , 2 , 3 , 4 , 5 ] )
  3. print (data )
2022_03_image-163.jpg

Adding columns using assign()

You can use the assign() function to insert multiple new columns in a DataFrame when:

  • Index of the new column can be ignored
  • Values of an existing column need to be overwritten

This method returns a new DataFrame object, that is a copy of the DataFrame, containing all the original columns along with the new ones.

 
 
 
 
   
  1. s1 = pd. Series ( [ 40.5 , 38.5 , 33 , 28 , 31 ] , index = [ 0 , 1 , 2 , 3 , 4 ] )
  2. s2 = pd. Series ( [ 48.5 , 42 , 41 , 37 , 43 ] , index = [ 0 , 1 , 2 , 3 , 4 ] )
  3.  
  4. data. assign (science_score =s1. values , english_score =s2. values )
2022_03_image-164.jpg

Adding a column using insert()

You can use the insert() function when you need to insert a new column in a specific position or index.

 
 
 
 
   
  1. #Using the Series s2 created above
  2. data. insert ( len (data. columns ) , 'english_score' , s2. values )
  3. print (data )
2022_03_image-165.jpg

What if you wanted to insert the english_score before the math_score?

 
 
 
 
   
  1. #Using the Series s2 created above
  2. data. insert ( 4 , 'english_score' , s2. values )
  3. print (data )
  4.  
2022_03_image-166.jpg

 

Recommended online courses

Best-suited Python courses for you

Learn Python with these high-rated online courses

– / –
40 hours
– / –
5 days
– / –
3 days
3 K
3 weeks
– / –
4 days
– / –
20 hours
– / –
2 months
Free
6 weeks

Adding duplicate columns using insert()

The allow_duplicates parameter is set to False by default and returns a ValueError if the new column has a duplicate column name.

 
 
 
 
   
  1. s3 = pd. Series ( [ 43 , 34 , 33.5 , 29 , 47 ] , index = [ 0 , 1 , 2 , 3 , 4 ] )
  2.  
  3. data. insert ( 5 , 'english_score' , s3. values , allow_duplicates = True )
  4. print (data )
  5.  
2022_03_image-167.jpg

As you can observe, there are two english_score columns in the above DataFrame.

Adding a column using concat()

You can concatenate a new column to an existing DataFrame by setting axis=1. The output would be a new DataFrame with the concatenated column.

 
 
 
 
   
  1. s4 = pd. Series ( [ 48 , 46 , 43.5 , 49 , 47 ] , index = [ 0 , 1 , 2 , 3 , 4 ] )
  2.  
  3. data = pd. concat ( [data , s4. rename ( 'PE_score' ) ] , axis = 1 )
  4. data
2022_03_image-168.jpg

Adding a column using apply()

When performing data manipulation, you might need to add a new column based on the values in the existing column(s). For this, apply() method can be used as shown:

 
 
 
 
   
  1. data [ 'avg_score' ] = data. apply ( lambda row:
  2. ( (row. math_score + row. science_score ) / 2 ) ,
  3. axis = 1 )
  4. data
  5.  
2022_03_image-169.jpg

As shown in the above DataFrame, we have calculated the average score based on the math_score and science_score columns using the lambda function.

Setting axis=1 ensures that apply() method works at the column level.

Adding an empty column

You can also add an empty column to the DataFrame by assigning a new column with the pd.NaT. Let’s add an empty column to our original DataFrame:

 
 
 
 
   
  1. data [ 'avg_score' ] = pd. NaT
  2. data
2022_03_image-170.jpg

pd.NaT denotes missing or null values in the Pandas DataFrame.

Adding a column with a constant value

You can assign a single value to all elements in a new column, as shown:

 
 
 
 
   
  1. data [ 'total_score' ] = 50
  2. data
2022_03_image-171.jpg

 

Endnotes

When inserting new columns to your Pandas DataFrame, you must pick the most suitable method based on your requirement. Pandas is a very powerful data processing tool and provides a rich set of functions to process and manipulate data for analysis. If you seek to learn the basics and various functions of Pandas, you can explore related articles here.

———————————————————————————————————————

Top Trending Tech Articles:
Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions

Contributed by – Prerna Singh

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio