Tutorial – VLOOKUP in Pandas

Tutorial – VLOOKUP in Pandas

3 mins read15.4K Views Comment
Updated on Jan 24, 2023 11:30 IST

VLOOKUP is a common Excel function that stands for ‘Vertical Lookup’. The article discusses the use of VLOOKUP in Pandas.

2022_03_Vlookup-in-Pandas.jpg

We already know that Pandas DataFrames are tabular data structures that store data similar to an Excel or CSV file – in rows and columns. VLOOKUP is a common Excel function that is essentially used for vertically arranged data and allows you to map data from one table to another. In Pandas, VLOOKUP merges two DataFrames if both have a common attribute (column). You can perform VLOOKUP in Pandas using map() and merge() methods as discussed in this article:

For our purpose today, let’s create a sample DataFrame as shown below:

 
#Importing Pandas Library
import pandas as pd
#Creating a Sample DataFrame
df = pd.DataFrame({
'name': [ 'Bob', 'Tom', 'Rob', 'Ben', 'Pam'],
'age': [ 10, 12, 13, 11, 12],
'gender': [ 'M', 'M', 'M', 'M', 'F'],
'birthmonth': [ 'Jan', 'Aug', 'Oct', 'Dec', 'Dec']
})
df
Copy code
sample DataFrame

Our dummy dataset comprises of 4 columns – ‘name’, ‘age’, ‘gender’, and ‘birthmonth’. As you can observe, it contains both numerical and categorical variables.

Now, let’s see how we can emulate using the VLOOKUP function in Pandas through this dataset.

The map() method 

The pandas .map() method allows us to map values to a Pandas Series, or a column in a Pandas DataFrame. This can be done using a dictionary, where the key is the corresponding value in our Pandas column and the value is the value that we want to map into it.

To understand this better, let’s create a dictionary that contains our mapping values:

 
birthmonth_map = {
'Jan': 'January',
'Aug': 'August',
'Oct': 'October',
'Dec': 'December'
}
Copy code

Now, we will apply the map() method to the column that we want to map into:

df[‘birthmonth’] = df[‘birthmonth’].map(birthmonth_map)

df

Now, we will apply the map() method to the column that we want to map into:

Thus, we have performed VLOOKUP using a dictionary.

But what if the data is stored in another DataFrame, as is when working with relational databases like SQL? In such cases, instead of working with Python dictionaries, we use the merge() method.

How to Read and Write Files Using Pandas
Difference between loc and iloc in Pandas
Recommended online courses

Best-suited Python for data science courses for you

Learn Python for data science with these high-rated online courses

Free
4 weeks
12 K
8 hours
4.24 K
6 weeks
40 K
100 hours
4.99 K
– / –
– / –
– / –
– / –
60 hours
– / –
90 hours
1.27 L
12 hours

The merge() method

The pandas .merge() method allows us to merge two DataFrames together.

In the DataFrame we created above, we have a column ‘age’ that corresponds to the year a child was born in. Let’s create another DataFrame that contains the mapping values (birth year) for the age:

 
#Creating another DataFrame
df2 = pd.DataFrame({
'age': [10, 11, 12, 13, 14, 15],
'birthyear': [2012, 2011, 2010, 2009, 2008, 2007]
})
df2
Copy code
Creating another DataFrame

Now, let’s see how we can merge the two different DataFrames using the merge() method:

 
df = pd.merge(left=df, right=df2, how='left')
df
Copy code
DataFrames

Note that VLOOKUP is essentially a left join between two tables, that is, the output consists of all the rows in the left table and only the matched rows from the right table.

  • The arguments left and right are positional parameters that choose which DataFrames to use as your left and right tables in the join.
  • The how parameter sets how the tables have to be joined: left, right, inner, or outer. 
Data Cleaning Using Pandas

Performing VLOOKUP on right join

In the right join, the output DataFrame consists of all the rows in the right DataFrame and only the matched rows from the left DataFrame. The unmatched rows will be replaced by NaN values.

 
df = pd.merge(left=df, right=df2, how='right')
df
Copy code

Table

Description automatically generated

Performing VLOOKUP on inner join

By setting the how parameter to inner, the final DataFrame will contain only the rows for which the condition is satisfied in both the DataFrames. 

 
inner_join = pd.merge(df, df2, on ='age', how ='inner')
inner_join
Copy code
Final DataFrame
Performing Data Manipulation in Python using Pandas

Performing VLOOKUP on outer join

By setting the how parameter to the outer, the final DataFrame will contain rows from both the DataFrames. If rows are matched, values will be shown. If rows do not match, NaN will be displayed.

 
outer_join = pd.merge(df, df2, on ='age', how ='outer')
outer_join
Copy code
Performing VLOOKUP on outer join

Thus, we have performed VLOOKUP on four types of joins.

Endnotes

The Pandas library makes it incredibly easy to emulate VLOOKUP functions. Mapping and merging data are essential steps during your data preparation, especially if you’re working with normalized datasets from databases. Pandas is a very powerful data processing tool and provides a rich set of functions to process and manipulate data for analysis.


Top Trending Articles:

Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio