Understanding Set Theory – What, Where, Why, and How do we use it in Data Science
Set in mathematics is a well-defined collection of objects that doesn’t vary from person to person. In this article, we will briefly discuss set theory, its representation, subset, cardinality, union and intersection of sets with the help of examples.
Set theory is a mathematical theory of a well-defined collection of objects called a set, and the objects of the set are called elements.
Every dataset that we use for the machine learning model is a collection of objects of a particular kind, such as
Meteorological data consists of temperature (minimum and maximum), wind speed, wind direction, visibility, sea level pressure, humidity, geographical location, humidity, precipitation, and many more.
Meteorologist uses this data to forecast the weather of any particular region, but it is more complex than it looks. They first pre-process the data, i.e.,
- Classifies the given dataset into categorical and numerical datasets
- Joining different variables (union & intersection) to find the correlation between the variables
- Split the datasets into two different subsets for training and testing data.
This article will briefly discuss sets, types of sets, subsets, the cardinality of the set, the union and intersection of sets, and how they can be used in Data science.
So, let’s dive deep to learn more about set theory.
Table of Content
Best-suited Data Science Basics courses for you
Learn Data Science Basics with these high-rated online courses
What is Set
Set in mathematics is a well-defined collection of objects that doesn’t vary from person to person.
Example:
- First five Natural Numbers: {1, 2, 3, 4, 5}
- Vowels in English: {a, e, i, o, u}
Note:
- The objects of the set are called elements.
- A set of 5 best cricketers in the world is not a set, as it will vary from person to person.
Representation
Set can be represented in two forms:
Roster Form: In the roster form, all the elements of a set are listed. The set elements are separated by commas and enclosed in {}.
Example: Vowels in English: {a, e, i, o, u}
Set-Builder Form: All the set elements possess a single common property, and there will not be any data point outside the set that will satisfy the defined property.
A = {x: x is a vowel in the English alphabet}
Subset
A set (B) is said to be the subset of a set (A) if the elements of B are contained in set A.
In other words, if all the elements of set B are contained in set A, then B is said to be the subset of A, and A is said to be the superset of B.
Notation: if B is a subset of A, then it is represented by B A.
Example: A = {1, 2, 3}, then the subset of A are {}, {1}, {2},{3}, {1, 2}, {1, 3}, {2, 3}, and {1, 2, 3}.
Now, let’s move to see what are the different types of sets:
Types of Set
- Empty Set: A set that has no elements is called an empty, null, or void set.
- Example: B = {x: x is an integer between 0.2 and 0.5}
- Singleton Set: Set that has only one element is known as a singleton set.
- Example: C = {x: 0.5 < x < 1.5 for x belongs to integer}
- Finite and Infinite Set: A set containing a finite number of elements is known as a finite set, whereas a set with an infinite number of elements is known as an infinite set.
- Example: D = {number of prime numbers between 1 to 50}
- Example: E = {number of stars in the galaxy}
- Equal Set: If A and B are two sets, then A = B, if and only if:
- The number of elements in both sets is the same.
- Elements in both sets are the same.
- Example: A = {2, 3, 5, 7, 11, 13, 17, 19}, B = {11, 2, 13, 19, 7, 5, 3, 17}
- Here, A = B since both set contains the same 11 elements.
- Example: A = {2, 3, 5, 7, 11, 13, 17, 19}, B = {11, 2, 13, 19, 7, 5, 3, 17}
- Power Set: The set of all the subsets of a set is known as the power set.
- Example: Power set of {1, 2, 3} is {{}, {1}, {2},{3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
Cardinality of a Set
The number of unique elements in the set is known as the cardinality of the set.
- If any set A has k elements, then the cardinality of A is given by: n(A) = k.
Example:
- F = {1, 2, 3, 4, 5}, then the cardinality of F is 5.
- G = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, then the cardinality of G is 5.
Union and Intersection of Sets
Union: Union of two sets is the set that contains all the elements of both sets. It is the smallest set that contains all the elements of both sets.
Representation:
Example 1: A = {1, 4, 9, 16, 25}, B = {2, 3, 5, 7, 11}, then A U B = {1, 2, 3, 4, 5, 7, 9, 11, 16, 25}.
Example 2: A = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, B = {1, 4, 9, 16, 25}, then A U B = {1, 2, 3, 4, 5, 9, 16, 25}.
Properties of Union of Sets
- A U B = B U A
- A U (B U C) = (A U B) U C
- {} U A = A
- A U A = A
- If B is a subset of A, then
- A U B = A
Intersection: Intersection of two sets is the set of all elements that are common to both the sets.
Representation
Example 1: A = {1, 4, 9, 16, 25}, B = {2, 3, 5, 7, 11}, then
Example 2: A = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}, B = {1, 4, 9, 16, 25}, then
Properties of Intersection of Sets
Relation between the Cardinality of Union and Intersection
where,
Programming Online Courses and Certification | Python Online Courses and Certifications |
Data Science Online Courses and Certifications | Machine Learning Online Courses and Certifications |
Now, let’s take a real-life example to get a better understanding of sets.
Problem Statement: Let we are testing a machine learning model that predicts pregnancy in females, and for that, we have taken blood samples of both males and females. The model produces output into 4 different subsets.
- Male: Pregnant
- Male: Not Pregnant
- Pregnant Female: Pregnant
- Pregnant Female: Not-Pregnant
Now, represents the data into a set and find True positive, false positive, True negative, and false negative.
Solution:
X = {set of all people (male + female) who have take blood test)
A = {set of Males}
B = {set of Pregnant Females}
C = {set of output: Pregnant}
D = {set of output: Not-Pregnant}
Here, A, B, C, and D are the subsets of X.
True Negative: Males who are tested not-pregnant.
False Positive: Males who are tested pregnant.
False Negative: Pregnant Females who are tested not-pregnant.
True Positive: Pregnant Females who are tested pregnant.
Now, we will find the true positive, true negative, false positive, and true negative rates using the cardinality of sets.
True Positive Rate (TPR)
False Positive Rate (FPR)
True Negative Rate (TNR)
False Negative Rate (FNR)
The performance of the model will be good if
- True Positive Rate and True Negative Rate are closer to 1.
- False Positive Rate and False Negative Rate are closer to 0.
Conclusion
In this article, we have briefly discussed set theory, its representation, subset, cardinality, union and intersection of sets with the help of examples.
Hope you will like the article.
Top Trending Article
Top Online Python Compiler | How to Check if a Python String is Palindrome | Feature Selection Technique | Conditional Statement in Python | How to Find Armstrong Number in Python | Data Types in Python | How to Find Second Occurrence of Sub-String in Python String | For Loop in Python |Prime Number | Inheritance in Python | Validating Password using Python Regex | Python List |Market Basket Analysis in Python | Python Dictionary | Python While Loop | Python Split Function | Rock Paper Scissor Game in Python | Python String | How to Generate Random Number in Python | Python Program to Check Leap Year | Slicing in Python
Interview Questions
Data Science Interview Questions | Machine Learning Interview Questions | Statistics Interview Question | Coding Interview Questions | SQL Interview Questions | SQL Query Interview Questions | Data Engineering Interview Questions | Data Structure Interview Questions | Database Interview Questions | Data Modeling Interview Questions | Deep Learning Interview Questions |
FAQs
What is a Set?
Set in mathematics is a well-defined collection of objects that doesnu2019t vary from person to person. Example: Five five natural numbers, vowels in English
What is a Subset?
A set B is said to be the subset of A, if the element of B is contained in A.
What are the different types of sets?
Empty Set, Singleton Set, Finite and Infinite Set, Equal Set, and Power Set are some common types of sets.
Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio