What is a Decision Tree in ML?

Viraj_Lakshitha
5 min readJan 21, 2021
Classification Algorithm in Machine Learning — Decision Tree

What is Decision Tree?

Decision Tree is a Supervised ML that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. The decisions or the test are performed on the basis of features of the given dataset.

Types of Decision Trees

Types of decision trees are based on the type of target variable that we have. It can be of two types:

  1. Categorical Variable Decision Tree: Decision Tree which has a categorical target variable then it called a Categorical variable decision tree.
  2. Continuous Variable Decision Tree: Decision Tree has a continuous target variable then it is called Continuous Variable Decision Tree.

Important Terminology related to Decision Trees

  1. Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets.
  2. Splitting: It is a process of dividing a node into two or more sub-nodes.
  3. Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node.
  4. Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
  5. Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting.
  6. Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-tree.
  7. Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node.

Metrics of Decision Tree

Gini Impurity

The equation to calculate the Gini Impurity

The Gini impurity measure is one of the methods used in decision tree algorithms to decide the optimal split from a root node and subsequent splits.

Entropy

The equation to Calculate the Entropy

A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogenous)

Information Gain

Information gain is the reduction in entropy or surprise by transforming a dataset and is often used in training decision trees. Information gain is calculated by comparing the entropy of the dataset before and after a transformation

Entropy and Information Gain vs Gini Impurity and Information Gain

The Gini Index and the Entropy have two main differences:

  • Gini Index has values inside the interval [0, 0.5] whereas the interval of the Entropy is [0, 1]. In the following figure, both of them are represented. The Gini index has also been represented multiplied by two to see concretely the differences between them, which are not very significant.
Entropy vs Gini Impurity
  • Computationally, entropy is more complex since it makes use of logarithms and consequently, the calculation of the Gini Index will be faster.

Therefore, we are going to analyze the impact on the training time when using one criterion or the other. For that purpose, different synthetic datasets have been generated. All these datasets have 10 features and they can be grouped into 4 groups, depending on whether the features are informative, redundant, repeated, or random:

Advantages of Decision Tree

1. Easy to read and interpret — One of the advantages of decision trees is that their outputs are easy to read and interpret, without even requiring statistical knowledge. For example, when using decision trees to present demographic information on customers, the marketing department staff can read and interpret the graphical representation of the data without requiring statistical knowledge. The data can also be used to generate important insights on the probabilities, costs, and alternatives to various strategies formulated by the marketing department.

2. Easy to prepare — Compared to other decision techniques, decision trees take less effort for data preparation. Users, however, need to have ready information in order to create new variables with the power to predict the target variable. They can also create classifications of data without having to compute complex calculations. For complex situations, users can combine decision trees with other methods.

3. Less data cleaning required — Another advantage of decision trees is that, once the variables have been created, there is less data cleaning required. Cases of missing values and outliers have less significance on the decision tree’s data.

Limitations of the Decision Tree

  • Trees can be very non-robust. A small change in the training data can result in a large change in the tree and consequently the final predictions.
  • The problem of learning an optimal decision tree is known to be NP-complete under several aspects of optimality and even for simple concepts. Consequently, practical decision-tree learning algorithms are based on heuristics such as the greedy algorithm where locally optimal decisions are made at each node. Such algorithms cannot guarantee to return the globally optimal decision tree. To reduce the greedy effect of local optimality, some methods such as the dual information distance (DID) tree were proposed.
  • Decision-tree learners can create over-complex trees that do not generalize well from the training data. Mechanisms such as pruning are necessary to avoid this problem (with the exception of some algorithms such as the Conditional Inference approach, that does not require pruning).
  • For data including categorical variables with different numbers of levels, information gain in decision trees is biased in favour of attributes with more levels. However, the issue of biased predictor selection is avoided by the Conditional Inference approach, a two-stage approach, or adaptive leave-one-out feature selection.

For more information:

How to build a machine learning model?

Github Gist — Decision Tree from Scratch (Explain in A-Z )

Reference

Wikipedia, Quantdare

--

--

Viraj_Lakshitha

I’m Undergraduate Software Engineer from Informatic Institute of Technology affiliated with University of Westminster, and National Sport Personality in LK