Tutorial – VLOOKUP in Pandas

3 mins read15.4K Views Comment

Updated on Jan 24, 2023 11:30 IST

VLOOKUP is a common Excel function that stands for ‘Vertical Lookup’. The article discusses the use of VLOOKUP in Pandas.

We already know that Pandas DataFrames are tabular data structures that store data similar to an Excel or CSV file – in rows and columns. VLOOKUP is a common Excel function that is essentially used for vertically arranged data and allows you to map data from one table to another. In Pandas, VLOOKUP merges two DataFrames if both have a common attribute (column). You can perform VLOOKUP in Pandas using map() and merge() methods as discussed in this article:

The map() method
The merge() method

For our purpose today, let’s create a sample DataFrame as shown below:

#Importing Pandas Library
import pandas as pd
 
#Creating a Sample DataFrame
df = pd.DataFrame({
    'name': [ 'Bob', 'Tom', 'Rob', 'Ben', 'Pam'],
    'age': [ 10, 12, 13, 11, 12],
    'gender': [ 'M', 'M', 'M', 'M', 'F'],
    'birthmonth': [ 'Jan', 'Aug', 'Oct', 'Dec', 'Dec']
})
 
df
Copy code

Our dummy dataset comprises of 4 columns – ‘name’, ‘age’, ‘gender’, and ‘birthmonth’. As you can observe, it contains both numerical and categorical variables.

Now, let’s see how we can emulate using the VLOOKUP function in Pandas through this dataset.

The map() method

The pandas .map() method allows us to map values to a Pandas Series, or a column in a Pandas DataFrame. This can be done using a dictionary, where the key is the corresponding value in our Pandas column and the value is the value that we want to map into it.

To understand this better, let’s create a dictionary that contains our mapping values:

birthmonth_map = {
    'Jan': 'January',
    'Aug': 'August',
    'Oct': 'October',
    'Dec': 'December'
}
Copy code

Now, we will apply the map() method to the column that we want to map into:

df[‘birthmonth’] = df[‘birthmonth’].map(birthmonth_map)

Thus, we have performed VLOOKUP using a dictionary.

But what if the data is stored in another DataFrame, as is when working with relational databases like SQL? In such cases, instead of working with Python dictionaries, we use the merge() method.

How to Read and Write Files Using Pandas

In this tutorial, we are going to see how to read data and files using Pandas.

Read Later

Difference between loc and iloc in Pandas

In Pandas, loc is used to access rows and columns by labels or boolean arrays, while iloc is used to access rows and columns by integer positions. Both functions facilitate...read more

Read Later

Recommended online courses

Best-suited Python for data science courses for you

Learn Python for data science with these high-rated online courses

Python for data science

IIT MadrasCertificate

5.0

Total Fees

Free

Duration

4 weeks

Data Analysis with Python for Managers (with Live Project)

Coding NinjasCertificate

4.6

Total Fees

₹12 K

Duration

8 hours

Data Science using Python

IIT KanpurCertificate

4.0

Total Fees

₹4.24 K

Duration

6 weeks

Data Science Online Training

Besant Technologies, Velachery - ChennaiCertificate

5.0

Total Fees

₹40 K

Duration

100 hours

Certificate Program in Data Science for Finance (CPDSF)

Indian Institute of Quantitative FinanceCertificate

Total Fees

₹68 K

Duration

3 months

Online Course Data Science with Python

ThinkNext TechnologiesCertificate

Total Fees

₹4.99 K

Duration

– / –

Certified Professional Diploma in Data Science

NetTech IndiaCertificate

4.0

Total Fees

– / –

Duration

– / –

DATA SCIENCE COURSE USING PYTHON.

CETPA Infotech Pvt LtdCertificate

5.0

Total Fees

– / –

Duration

60 hours

Python

Seven Mentor Pvt LtdCertificate

4.5

Total Fees

– / –

Duration

90 hours

Introduction to Python for Data Science and Data Engineering

DatabricksCertificate

Total Fees

₹1.27 L

Duration

12 hours

The merge() method

The pandas .merge() method allows us to merge two DataFrames together.

In the DataFrame we created above, we have a column ‘age’ that corresponds to the year a child was born in. Let’s create another DataFrame that contains the mapping values (birth year) for the age:

#Creating another DataFrame
df2 = pd.DataFrame({
    'age': [10, 11, 12, 13, 14, 15],
    'birthyear': [2012, 2011, 2010, 2009, 2008, 2007]
})
 
df2
Copy code

Now, let’s see how we can merge the two different DataFrames using the merge() method:

df = pd.merge(left=df, right=df2, how='left')
df
Copy code

Note that VLOOKUP is essentially a left join between two tables, that is, the output consists of all the rows in the left table and only the matched rows from the right table.

The arguments left and right are positional parameters that choose which DataFrames to use as your left and right tables in the join.
The how parameter sets how the tables have to be joined: left, right, inner, or outer.

Data Cleaning Using Pandas

Data preparation involves data collection and data cleaning. When working with multiple sources of data, there are instances where the collected data could be incorrect, mislabeled, or even duplicated. This...read more

Read Later

Performing VLOOKUP on right join

In the right join, the output DataFrame consists of all the rows in the right DataFrame and only the matched rows from the left DataFrame. The unmatched rows will be replaced by NaN values.

df = pd.merge(left=df, right=df2, how='right')
df
Copy code

Performing VLOOKUP on inner join

By setting the how parameter to inner, the final DataFrame will contain only the rows for which the condition is satisfied in both the DataFrames.

inner_join = pd.merge(df, df2, on ='age', how ='inner')
inner_join
Copy code

Performing Data Manipulation in Python using Pandas

Even before the birth of the internet, Data was an integral part of our life. Proper record-keeping and analysis was the key feature of a successful organization. Now with the...read more

Read Later

Performing VLOOKUP on outer join

By setting the how parameter to the outer, the final DataFrame will contain rows from both the DataFrames. If rows are matched, values will be shown. If rows do not match, NaN will be displayed.

outer_join = pd.merge(df, df2, on ='age', how ='outer')
outer_join
Copy code

Thus, we have performed VLOOKUP on four types of joins.

Endnotes

The Pandas library makes it incredibly easy to emulate VLOOKUP functions. Mapping and merging data are essential steps during your data preparation, especially if you’re working with normalized datasets from databases. Pandas is a very powerful data processing tool and provides a rich set of functions to process and manipulate data for analysis.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Tutorial – VLOOKUP in Pandas

The map() method

Best-suited Python for data science courses for you

Python for data science

Data Analysis with Python for Managers (with Live Project)

Data Science using Python

Data Science Online Training

Certificate Program in Data Science for Finance (CPDSF)

Online Course Data Science with Python

Certified Professional Diploma in Data Science

DATA SCIENCE COURSE USING PYTHON.

Python

Introduction to Python for Data Science and Data Engineering

The merge() method

Performing VLOOKUP on right join

Performing VLOOKUP on inner join

Performing VLOOKUP on outer join

Endnotes

Top Picks & New Arrivals