Decision Tree Analysis
What is Decision Tree Analysis?
Decision Tree Analysis is a visual model for effective decision-making, where various decisions and their possible outcomes, consequences, and risks are drawn out to pick the best series of decisions.1 This model works by splitting data into subsets based on certain features or questions, allowing classification and regression tasks. Decision trees are composed of nodes representing testing an element, and branches representing possible alternative outcomes. Often used in both decision analysis and machine learning, decision trees help break down complex decisions into manageable steps.
The Basic Idea
You and your friends have finally coordinated to take the same time off work—2 weeks. Since this is such a rare opportunity to do something together, you collectively decide it is time for that big Europe trip you’ve all been promising. With so many countries, cities, and historical sites to choose from, you’re completely stuck with where to go.
You suggest using a simplified decision tree to help everyone visualize the options and make an informed choice. The decision tree starts with a simple question: “What kind of experience are we looking for?” If the group prefers a sunny vacation on the beach, the following branch points to southern Europe, with choices like Italy or Greece. On the other hand, if everyone’s more interested in exploring nature and hiking, the tree points you toward countries like Scotland or Norway.
At each step, the decision tree continues to narrow down the options, and the process continues until you reach a final destination that meets everyone’s desires. In the end, what could have been a chaotic decision process is turned into a distinct, visual path, helping you reach a consensus without endless debates.
Decision Trees and Algorithms
Decision trees can reveal potential outcomes, necessary resources, and the overall utility of different strategies. Beyond machine learning, they are also widely used in fields like operations research and management science to find the most effective path to a goal. Though a decision tree can be seen as an algorithm itself, there are many algorithmic variations for building and refining them, each suited to different tasks.
Algorithms can be effectively visualized with decision trees because they present complex, data-heavy problems as clear, branching diagrams. This visual approach makes decision trees worthwhile across fields, from statistics to computer science, as they simplify complicated calculations into an intuitive model. Decision trees are particularly popular in machine learning because they involve minimal math and can depict complex problems in a single, easy-to-follow image. While decision trees were once drawn manually, artificial intelligence can now automatically generate them for various applications.
Anatomy of a Decision Tree
In the data-heavy decision analysis field, these trees are closely related to influence diagrams. These visuals have similar uses for assessing decisions when looking at the expected values of possible alternatives.3 Drawing or generating a tree involves several components: nodes, branches, and the root. Nodes are the different shapes that symbolize subsets of decisions or data in the pathway to picking a final decision. A node is split when a question is asked. Decision trees have 3 types of nodes:4
- Decision (parent) nodes: Square-shaped nodes that represent one of many options or questions that need to be selected. Specifically, in machine learning, these nodes indicate points where a feature or attribute is evaluated to split the data.
- Chance (child) nodes: Represented as circles, these nodes show points where an outcome is determined by chance or an event occurs outside of the decision-maker's control. This is less relevant for machine learning and is primarily used in decision theory contexts.
- End (leaf) nodes: Triangular-shaped nodes that represent final outcomes, where no further questions are asked. In decision analysis, they indicate the end of the decision path with a particular outcome or payoff. In machine learning, leaf nodes hold the predicted value or class label once all conditions have been evaluated (e.g., "Yes" or "No").
Branches represent the line of decisions chosen or questions asked with their corresponding answers until the leaf node, including alternative decision pathways. The root is the top node beginning the tree. Just like real tree branches, decision-tree analyses may require some pruning from time to time.5
Decision Trees and Machine Learning
In the machine learning world, decision trees are a clear and effective way to visualize complex algorithms. A decision tree can act as an algorithm itself, splitting data into branches based on decision rules. Overall, their popularity in machine learning can be attributed to ease of use and interpretation. This also requires less data preparation, automated handling of missing data points, and learning accessibility.
Unique problems can arise when it comes to decision trees and machine learning. Pruning allows developers to get rid of the chance nodes on the branches of a specific tree.5 This can create a more powerful tree, where branches with less important elements are cut off. This makes the trees simpler, too. Sometimes, a developer may decide to make a large tree to begin with, with the intention of pruning after a tree is ‘fully grown.’ As a result of a large tree, overly specific branches may arise—which then must be addressed with pruning.4
Decision trees are popular in machine learning for tasks like predicting numerical data (regression) or sorting data into categories (classification). For example, regression trees might predict stock prices, while classification trees help identify spam emails. Each type uses a different method to split data: regression uses Mean Squared Error (MSE) for numbers, while classification measures purity, often with Gini impurity.4
About the Author
Isaac Koenig-Workman
Isaac Koenig-Workman has several years of experience in roles to do with mental health support, group facilitation, and public speaking in a variety of government, nonprofit, and academic settings. He holds a Bachelor of Arts in Psychology from the University of British Columbia. Isaac has done a variety of research projects at the Attentional Neuroscience Lab and Centre for Gambling Research (CGR) with UBC's Psychology department, as well as contributions to the PolarUs App for bipolar disorder with UBC's Psychiatry department. In addition to writing for TDL he is currently a Justice Interviewer for the Family Justice Services Division of B.C. Public Service, where he determines client needs and provides options for legal action for families going through separation, divorce and other family law matters across the province.