Difference between loc and iloc in Pandas
loc[ ] and iloc[ ] in Pandas are used for convenient data selection and filtering in Pandas. The article covers the differences between loc and iloc in Pandas.
The Pandas library of Python is widely used for data manipulation by data scientists and data analysts. It comprises many methods and functions that help manage and analyze your data efficiently and quickly. The loc and iloc in pandas are used for slicing the data – means creating subsets of data from a Pandas Data frame. As a newbie Python programmer, one of the most fundamental questions you’re going to ask yourself is this – What’s the difference between loc and iloc in Pandas?
Let’s learn how they differ from each other.
In this blog we will cover the following sections:
- Difference between loc and iloc in Panadas
- Demo – Difference between loc and iloc – Try it Yourself
- What is loc Method?
- What is iloc Method?
- Comparisons between loc[ ] and iloc[ ]
- Endnotes
Difference between loc and iloc in Pandas data frame
Let’s summarize the difference between the loc and iloc in Pandas in a table:
loc in Pandas | iloc in Pandas |
---|---|
Label-based data selector | Index-based data selector |
Indices should be sorted in order, or loc[ ] will only select the mentioned indices when slicing | Indices need not be sorted in order when slicing |
Indices should be numerical, else slicing cannot be done | Indices can be numerical or categorical |
The end index is included during slicing | The end index is excluded during slicing |
Accepts bool series or list in conditions | Only accepts bool list in conditions |
Best-suited Python courses for you
Learn Python with these high-rated online courses
loc vs iloc – How does it differ? – Try it Yourself
Click the below colab icon to run and practice the demo
What is loc Method?
The loc[ ] is a label-based method used for selecting data as well as updating it. This is done by passing the name (label) of the row or column that we wish to select.
Syntax: loc[row_label, column_label]
Let’s understand this through an example. Let’s create a sample DataFrame using Pandas:
#Importing Pandas Libraryimport pandas as pd #Creating a Sample DataFramedf = pd.DataFrame({ 'id': [ 101, 102, 103, 104, 105, 106, 107], 'age': [ 20, 22, 23, 21, 22, 21, 25], 'group': [ 'A', 'B', 'C', 'C', 'B', 'A', 'A'], 'city': [ 'Tier1', 'Tier2', 'Tier2', 'Tier3', 'Tier1', 'Tier2', 'Tier1'], 'gender': [ 'M', 'F', 'F', 'M', 'M', 'M', 'F'], 'degree': [ 'econ', 'econ', 'maths', 'finance', 'history', 'science', 'marketing']}) df
Output:
We have created a sample student dataset comprising 6 columns – ‘id’, ‘age’, ‘group’, ‘city’, ‘gender’, and ‘degree’. As you can see, it contains both numerical and categorical variables.
Firstly, let’s set the ‘id’ column as the index using set_index():
df = df.set_index('id')df.head()
Output:
This will help us understand the difference between loc[ ] and iloc[ ] better.
Operations using loc method
- Selecting a row using loc[ ]
- Slicing using loc[ ]
- Filtering rows using loc[ ]
- Filtering columns using loc[ ]
- Updating columns using loc[ ]
Selecting a row using loc[ ]
Let’s select a row using loc[ ]:
#Selecting a row with labeldf.loc[102]
Once you set the ‘id’ column as the index, its values become the labels. So, selecting label 102 will display the record for that row.
Slicing using loc[ ]
Let’s use loc[ ] to perform slicing:
#Slicing using loc[]df.loc[101:103]
Slicing simply means selecting a range of values. Here, we have selected and displayed all records between labels 101 and 103 (end label included).
Filtering rows using loc[ ]
Let’s set a condition to filter rows:
#Selecting all rows with a given conditiondf.loc[df.age >= 22]
As you can see, all records where age is greater than or equal to 22 are displayed.
How about we set multiple conditions to filter the rows?
#Selecting rows with multiple conditionsdf.loc[(df.age >= 22) & (df.city == 'Tier1')]
Here we’ve displayed records where age is greater than or equal to 22 and the city is tier 1.
Filtering columns using loc[ ]
Let’s set a condition to filter columns:
#Selecting columns with a given conditiondf.loc[(df.gender == 'M'), ['group', 'degree']]
Here we’ve chosen to display two columns where the gender is male.
Updating columns using loc[ ]
Let’s set a condition to update columns:
#Updating a column with a given conditiondf.loc[(df.gender == 'M'), ['group']] = 'A'df
If the gender of an individual is male, then their group would be updated to ‘A’.
We can also update multiple columns by setting a condition:
#Updating multiple columns with a given conditiondf.loc[(df.gender == 'F'), ['group', 'city']] = ['B','Tier2']df
So, if the gender of an individual is female, then their group and city would be updated to ‘B’ and ‘Tier2’ respectively.
What is iloc Method?
The iloc[ ] is an index-based method used for data selection. In this case, we pass the positions of the row or column that we wish to select (0-based integer index).
Syntax: iloc[row_position, column_position]
For the given dataset, we can visualize the indices for rows and columns as follows:
Operations using iloc method
Selecting a row using iloc
Let’s select a row using iloc:
#Selecting rows with indexdf.iloc[[2,4]]
Since we’ve used the iloc[ ] method, 2 and 4 refer to the index number, and hence the second and the fourth row would be displayed, regardless of the label of the index.
Selecting rows and columns using iloc
Now let’s see how to select rows and columns using iloc:
#Selecting rows with particular index and particular columnsdf.iloc[[0,4],[1,3]]
[0,4] refers to index numbers 0 and 4 for rows and [1,3] refers to index numbers for columns.
Slicing using iloc
Let’s use iloc to select a range of rows:
#Selecting range of rowsdata.iloc[1:5]
Here, we have selected and displayed all records between indices 1 and 5 (index number 5, that is the endpoint, is not included).
We can also select a range of rows and columns:
#Selecting range of rows and columnsdf.iloc[1:3,2:4]
Here, we have displayed
- All rows between indices 1 and 3 (endpoint excluded).
- All columns between indices 2 and 4 (endpoint excluded).
Comparisons between loc and iloc
loc and iloc with callable
loc accepts a callable function as an index. The function takes one argument and returns a value valid for indexing. For demonstration:
- Selecting columns using callable
#Selecting columns using callabledf.loc[:, lambda df: ['gender', 'degree']]
- Filtering columns using callable
#Filtering data using callabledf.loc[lambda df: df.age > 24, :]<span style="background-color: inherit; font-size: inherit;"> </span style="background-color: inherit; font-size: inherit;">
iloc[ ] also accepts a callable function as an index. But iloc[ ] required a list() to convert the output of conditions into a Boolean list:
- Filtering columns using callable
#Filtering data using callable functiondf.iloc[lambda df: list(df.age > 24), :]<span style="background-color: inherit; font-size: inherit;"> </span style="background-color: inherit; font-size: inherit;">
loc and iloc are interchangeable when the labels of the DataFrame are 0-based integers
For demonstration, let’s create a new DataFrame from the above example with 0-based integers as labels:
#Create a DataFrame with 0-based integers as headers and index labelsdf.to_csv('data.csv') data = pd.read_csv('data.csv', header=None)data
When the header is specified to None, Pandas will generate 0-based integer values as headers.
Now, loc[ ] can accept integer values as labels:
data.loc[3, 5]
Here, integer values 3 and 5 are interpreted as labels of the index. Hence, in this case loc[ ] and iloc[ ] are interchangeable:
data.loc[3, 5] == data.iloc[3, 5]
Endnotes
Pandas is a very powerful data processing tool for the Python programming language. It provides a rich set of functions to process and manipulate data for analysis. Hope this article on the difference between loc and iloc in Pandas gave you relevant insight on how to use loc[ ] and iloc[ ] methods for data selection.
Top Trending Articles:
Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio