Boolean Indexing in Python
In this article, we will learn the concept of Boolean indexing, methods for Boolean indexing in pandas and NumPy. Later in the blog we will also discuss how to filter data using Boolean indexing.
When you’re performing data analysis using Python, a common operation is filtering the data. It allows you to extract relevant patterns and insights from the data. One way to filter data is through Boolean vectors. The process of doing this is commonly known as Boolean indexing.
In this article, we will learn how Boolean indexing is performed in Python using Pandas and NumPy packages. We will be covering the following sections:
- What is Boolean Indexing?
- Methods For Boolean Indexing in Pandas
- Boolean Indexing Using NumPy
- Filtering Data Using Boolean Indexing
What is Boolean Indexing?
Boolean indexing is used to filter data by selecting subsets of the data from a given Pandas DataFrame. The subsets are chosen based on the actual values of the data in the DataFrame and not their row/column labels.
In Boolean indexing, we filter the values by using a Boolean vector. Let’s look at the different methods through which we perform Boolean indexing:
Methods for Boolean Indexing in Pandas
In Pandas, Boolean indexing can be performed on DataFrames using two ways:
- .loc[ ]
- .iloc[ ]
But before that, we have to first create a DataFrame such that the index of the DataFrame contains a Boolean value that is either True or False.
#Importing pandasimport pandas as pd #Create a dictionarydict = {'name':["Rachel", "Monica", "Joey", "Phoebe"], 'job': ["Doctor", "Chef", "Actor", "Singer"], 'Age':[28, 28, 30, 31]} #Create a dataframe with boolean valuesdf = pd.DataFrame(dict, index = [False, True, True, False]) print(df)
Output:
Now we have created a DataFrame with the Boolean index, let’s see how we can access the DataFrame using the two methods –
Method 1 – Boolean Indexing using .loc[ ]
To access a Pandas DataFrame with a Boolean index using .loc[ ], we simply pass the Boolean value (True or False) to the .loc[ ] function, as shown below:
#Accessing the dataframe using .loc[] functionprint(df.loc[True])
Output:
Method 2 – Boolean Indexing using .iloc[ ]
When accessing the DataFrame through .iloc[ ] function, we need to keep in mind that .iloc[ ]accepts only an integer as an argument.
Let’s understand this through the following example:
Code 1:
#Accessing the dataframe using .iloc[] functionprint(df.iloc[False])
Output:
As expected, the function throws a TypeError if we do not pass an integer. So, we pass the index of the value in the DataFrame, as shown below:
Code 2:
#Accessing the dataframe using .iloc[] functionprint(df.iloc[2])
Best-suited Python courses for you
Learn Python with these high-rated online courses
Boolean Indexing Using NumPy
Boolean indexing in NumPy uses a Boolean array to select elements from an array that meet a certain condition. The Boolean array is a binary mask that indicates whether each element in the array should be selected or not.
For example, you can create a Boolean array that has a True value for elements in the original array that are greater than a certain value and False values for elements that are less than or equal to that value.
Creating a Boolean Mask
import numpy as np
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])mask = data > 5
The mask array will have the same shape as the original array, but its values will be either True or False depending on whether the elements in the original array satisfy the condition or not. In this case, the mask array will have False values for elements 1 to 5 and True values for elements 6 to 10.
Using the Boolean Mask to Select Data
Once you have created the Boolean mask, you can use it to select elements from the original array that meet the condition. You can do this by indexing the original array with the Boolean mask:
filtered_data = data[mask]
The filtered_data array will only contain the elements from the original array that satisfy the condition. In this case, the filtered_data array will contain elements 6 to 10.
Combining Conditions
You can also combine multiple conditions to create a more complex mask that selects elements based on multiple criteria. For example, you can select elements that are greater than 5 and less than or equal to 8:
mask = (data > 5) & (data <= 8) filtered_data = data[mask]
In this case, the filtered_data array will contain elements 6, 7, and 8.
Recursion Function in Python | count() Function in Python |
len() Function in Python | float() Function in Python |
range() Function in Python | lambda() Function in Python |
Filtering Data Using Boolean Indexing
Using NumPy
Boolean Indexing can be used to filter data in Python by creating a Boolean mask, as discussed above, that corresponds to the data you want to select based on a certain condition.
Here’s an example to illustrate the use of Boolean Indexing to filter data in a NumPy array:
import numpy as np
# Create an array of numbersdata = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Define a condition for filtering the datacondition = data > 5
# Apply the condition to create the boolean maskmask = np.array(condition, dtype=bool)
# Use the boolean mask to select only those elements in the data that satisfy the conditionfiltered_data = data[mask]
print("Original data:", data)print("Boolean mask:", mask)print("Filtered data:", filtered_data)
Output:
In this example, the condition data>5 creates a Boolean mask with True values corresponding to the elements in the data array that are greater than 5. The filtered data is then obtained by indexing the original data array with this Boolean mask.
Using Pandas
You can use Boolean Indexing with pandas DataFrames to filter rows based by selecting elements from a DataFrame that meet certain conditions.
Here’s an example to illustrate the use of Boolean Indexing to filter data in a pandas DataFrame:
import pandas as pd
# create a sample DataFramedata = {'name': ['John', 'Jane', 'Jim', 'Joan'], 'age': [32, 28, 35, 40], 'city': ['New York', 'London', 'Paris', 'Berlin']}df = pd.DataFrame(data)
# create a boolean mask to select rows where the age is greater than 30mask = df['age'] > 30
# use the boolean mask to filter the DataFramefiltered_df = df[mask]
The filtered_df DataFrame will contain only the rows where the age is greater than 30. In this case, the rows for John, Jim, and Joan.
You can also combine multiple conditions to create a more complex filter. For example, you can select rows where the age is greater than 30 and the city is ‘Paris’ or ‘Berlin’:
mask = (df['age'] > 30) & ((df['city'] == 'Paris') | (df['city'] == 'Berlin'))filtered_df = df[mask]
In this case, the filtered_df DataFrame will contain only the row for Jim.
Endnotes
Hope this article was helpful for you to understand Boolean Indexing in Python. It’s an essential technique for data analysis and scientific computing, and it’s widely used in many areas, including machine learning, image processing, and data visualization. Whether you’re working with small or large datasets, Boolean Indexing is a simple and efficient way to manipulate and analyze your data in Python.
Contributed By: Prerna Singh
Top Trending Article
Top Online Python Compiler | How to Check if a Python String is Palindrome | Feature Selection Technique | Conditional Statement in Python | How to Find Armstrong Number in Python | Data Types in Python | How to Find Second Occurrence of Sub-String in Python String | For Loop in Python |Prime Number | Inheritance in Python | Validating Password using Python Regex | Python List |Market Basket Analysis in Python | Python Dictionary | Python While Loop | Python Split Function | Rock Paper Scissor Game in Python | Python String | How to Generate Random Number in Python | Python Program to Check Leap Year | Slicing in Python
Interview Questions
Data Science Interview Questions | Machine Learning Interview Questions | Statistics Interview Question | Coding Interview Questions | SQL Interview Questions | SQL Query Interview Questions | Data Engineering Interview Questions | Data Structure Interview Questions | Database Interview Questions | Data Modeling Interview Questions | Deep Learning Interview Questions |
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio