Splitting in Decision Tree

Splitting in Decision Tree

6 mins read14.3K Views Comment
Updated on Sep 18, 2023 16:31 IST

The below article goes through various methods to split a Decision Tree.

2022_04_Untitled-design-6.jpg

Machine Learning is one of the most in-demand technologies, with everyone wanting to master it and most businesses needing highly trained Machine Learning engineers. Various machine-learning techniques have been developed in this arena to tackle complicated issues quickly. These algorithms are highly automated and self-modifying, improving over time as more data is added and with minimal human intervention.

Contents

What is a Decision Tree?

Decision trees are one of the predictive modelling approaches used in machine learning. It uses a decision tree to travel from observations about an object (represented by the branches) to inferences about the item’s target value (represented by the leaves) (as a predictive model)

A decision tree’s main idea is to locate the features that contain the most information about the target feature and then split the dataset along with their values. The characteristic that best isolates the uncertainty from knowledge about the target feature is the most informative. The search for the most informative attribute continues until we have pure leaf nodes.

DecisionTree

Decision Tree Terminologies

Root Node: Represents the entire sample. This will further get divided into two or more homogeneous sets.

Decision Node: Nodes Branched from Root nodes are Decision nodes.

Branch: Formed by splitting the tree.

To summarize, The inputs are routed through the root node of every tree. This root node is further segmented into decision nodes that are conditionally dependent on results and observations. 

Splitting a single node into many nodes is known as splitting. A leaf node, also known as a terminal node, is a node that does not break into other nodes. A branch, sometimes called a sub-tree, is a section of a decision tree. Splitting is not the only concept that is diametrically opposite.

Decision trees classify cases by sorting them from the root to some leaf/terminal node, with the leaf/terminal node categorizing the example. Each node in the tree is a test case for a property, and each edge descending from it represents one of the test case’s possible solutions. This recursive procedure is carried out for each new node-rooted subtree.

How do you split nodes in a Decision tree?

Although their algorithms differ from those used in classification and regression trees, decision trees completely depend on the objective variable. There are a variety of methods for selecting how to partition the data.

The essence of decision trees is that they divide data sets into sections, resulting in an inverted decision tree with root nodes at the top. Through the pass-over nodes of the trees, the layered model of the decision tree leads to the end outcome.

Each node has an attribute (feature) that catalyses further splitting in the downward direction.

Multiple features are included in the decision-making process, and it is necessary to consider the relevance and repercussions of each feature, thereby assigning the relevant feature at the root node and traversing the node splitting downward.

Methods to split Decision Tree

There are some key splitting parameters to address the significant concerns described above. Yes, we shall discuss Entropy, Gini Index, and Information Gain within the scope of this post.

Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

1.53 L
11 months
2.5 L
2 years
34.65 K
11 months
2.5 L
2 years
– / –
8 hours
5.6 L
18 months
– / –
6 months

1. Entropy

Entropy is a measure of purity or the degree of uncertainty, impurity, or disorder of a random variable. It is, in essence, the assessment of impurity or unpredictability in data points

If all of the elements belong to the same class, the distribution is called “Pure,” and if they don’t, it’s called “Impurity”.

formula1

To put it another way, a high order of disorder indicates a low level of impurity. Entropy is a measure of disorder that ranges from 0 to 1. It can be higher than 1 depending on the number of groups or classes present in the data collection, but it has the same meaning.

Understanding Decision Tree Algorithm in Machine Learning
Decision Tree Algorithm for Classification
How Can Decision Tree Handle Complex Data?

2. Gini Impurity

If all elements are accurately split into different classes, the division is called pure (an ideal scenario). The Gini impurity (pronounced “genie”) is used to predict the likelihood of a randomly chosen example being incorrectly classified by a particular node. It’s referred to as an “impurity” measure because it demonstrates how the model departs from a simple division.

Gini impurity is measured on a scale of 0 to 1, with 0 indicating that all elements belong to the same class and 1 indicating that only one class exists. A Gini impurity of 1 suggests that all items are scattered randomly across various classes, whereas a value of 0.5 shows that the elements are distributed uniformly across some classes.

formula2

Now that we have seen what Gini Impurity is? let us see how to calculate it.

  • Calculate Gini coefficients for sub-nodes using the success(p) and failure(q) formulas (p2+q2)
  • Next, Calculate the impurity for each node using a weighted Gini score.

3. Information Gain

When it comes to measuring information gain, the concept of entropy is key.  “Information gain, on the other hand, is based on information theory.” “Information gain” refers to the process of identifying the most important features/attributes that convey the most information about a class. The entropy principle is followed with the goal of reducing entropy from the root node to the leaf nodes. Information gain is the difference in entropy before and after splitting, which describes the impurity of in-class items.

Information Gain = 1-Entropy

The entropy generally changes when we use a node in a decision tree to partition the training instances into smaller subsets. Information gain is a metric for entropy change.

The more information there is, the higher the entropy.

Now that we have seen what Information Gain is? Let us see how to calculate it.

  • For each split, calculate the entropy of each child node independently
  • Calculate the entropy of each split using the weighted average entropy of child nodes
  • Choose the split with the lowest entropy or the greatest gain in information
  • Repeat these steps to obtain homogeneous split nodes

Now, let us compare Information Gain and Gini Impurity

Information Gain Vs Gini Impurity

We’ll go over some comparison points gleaned from the preceding discussion to assist in deciding which strategy to adopt.

  • The likelihood of a class is multiplied by the log base 2 of that class’s probability to calculate information gain. Gini impurity is determined by subtracting the total of each class’s squared probability from one.
  • The Gini Impurity prefers larger partitions (distributions) and is easy to apply, whereas information gains prefer smaller partitions (distributions) with a wide range of values, needing a data and splitting criterion experiment.
  • CART algorithms employ the Gini Index approach, whereas ID3, C4.5 methods employ the Information Gain method
  • In contrast to the Gini index, which computes the difference between entropy before and after the split and indicates impurity in classes of elements, Information Gain computes the difference between entropy before and after the split and indicates impurity in classes of elements.

Conclusion

I hope this helps!! For the analysis of the real-time scenario, the Gini index and Information Gain are employed, and the data that is obtained from the real-time analysis is real. It’s also been referred to as “data impurity” or “data distribution” in a number of definitions. So we can figure out which data has a smaller or larger role in decision-making.

Top Trending Tech Articles:
Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions

Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio