ArticlesMachine Learning

What is a Decision Tree in Machine Learning


Decision Tree

A decision tree in machine learning is a popular supervised learning method. It applies to both classification and regression tasks. Decision trees follow a flowchart-like structure. Within that structure, an internal node represents a feature (or attribute), the branch represents a decision rule, and each leaf node represents the outcome or decision.

First, let’s remind ourselves about the difference between classification and regression.

Classification tasks involve predicting a categorical label or class for a given input. The goal is to assign each input instance to one of several predefined classes. For example, given a dataset of emails, a classification model could predict whether each email is spam or not spam. On the other hand, regression tasks involve predicting a continuous numerical value based on input features. The goal is to estimate the relationship between the input variables and the target variable. This is typically a real-valued number. For instance, in a housing price prediction task, the aim is to predict the price of a house based on features. Such features may be size, location, and number of bedrooms.

Workings of decision trees

Decision trees are popular machine learning algorithms because they are easy to interpret and understand. They can handle both numerical and categorical data, and require relatively little data preprocessing. Below is a simple representation of a decision tree.

what is a decision tree in machine learning

The following are basic operative fundamentals.

  1. Splitting: The tree starts with a root node. Then it further divides into branches based on the feature that best splits the data.
  2. Recursive Partitioning: This splitting process continues recursively, creating sub-nodes until it meets a stopping criterion. Typically, this happens when a node has instances belonging to the same class or when a predefined maximum depth is reached.
  3. Decision Making: When a new instance is presented to the decision tree, it traverses the tree from the root to a leaf node. It follows the decision rules at each internal node, and assigns the instance to the class represented by the majority of instances in the leaf node.