Decision Tree Analysis
What is Decision Tree Analysis?
Decision Tree Analysis is a visual model for effective decision-making, where various decisions and their possible outcomes, consequences, and risks are drawn out to pick the best series of decisions.1 This model works by splitting data into subsets based on certain features or questions, allowing classification and regression tasks. Decision trees are composed of nodes representing testing an element, and branches representing possible alternative outcomes. Often used in both decision analysis and machine learning, decision trees help break down complex decisions into manageable steps.
The Basic Idea
You and your friends have finally coordinated to take the same time off work—2 weeks. Since this is such a rare opportunity to do something together, you collectively decide it is time for that big Europe trip you’ve all been promising. With so many countries, cities, and historical sites to choose from, you’re completely stuck with where to go.
You suggest using a simplified decision tree to help everyone visualize the options and make an informed choice. The decision tree starts with a simple question: “What kind of experience are we looking for?” If the group prefers a sunny vacation on the beach, the following branch points to southern Europe, with choices like Italy or Greece. On the other hand, if everyone’s more interested in exploring nature and hiking, the tree points you toward countries like Scotland or Norway.
At each step, the decision tree continues to narrow down the options, and the process continues until you reach a final destination that meets everyone’s desires. In the end, what could have been a chaotic decision process is turned into a distinct, visual path, helping you reach a consensus without endless debates.
Decision Trees and Algorithms
Decision trees can reveal potential outcomes, necessary resources, and the overall utility of different strategies. Beyond machine learning, they are also widely used in fields like operations research and management science to find the most effective path to a goal. Though a decision tree can be seen as an algorithm itself, there are many algorithmic variations for building and refining them, each suited to different tasks.
Algorithms can be effectively visualized with decision trees because they present complex, data-heavy problems as clear, branching diagrams. This visual approach makes decision trees worthwhile across fields, from statistics to computer science, as they simplify complicated calculations into an intuitive model. Decision trees are particularly popular in machine learning because they involve minimal math and can depict complex problems in a single, easy-to-follow image. While decision trees were once drawn manually, artificial intelligence can now automatically generate them for various applications.
Anatomy of a Decision Tree
In the data-heavy decision analysis field, these trees are closely related to influence diagrams. These visuals have similar uses for assessing decisions when looking at the expected values of possible alternatives.3 Drawing or generating a tree involves several components: nodes, branches, and the root. Nodes are the different shapes that symbolize subsets of decisions or data in the pathway to picking a final decision. A node is split when a question is asked. Decision trees have 3 types of nodes:4
- Decision (parent) nodes: Square-shaped nodes that represent one of many options or questions that need to be selected. Specifically, in machine learning, these nodes indicate points where a feature or attribute is evaluated to split the data.
- Chance (child) nodes: Represented as circles, these nodes show points where an outcome is determined by chance or an event occurs outside of the decision-maker's control. This is less relevant for machine learning and is primarily used in decision theory contexts.
- End (leaf) nodes: Triangular-shaped nodes that represent final outcomes, where no further questions are asked. In decision analysis, they indicate the end of the decision path with a particular outcome or payoff. In machine learning, leaf nodes hold the predicted value or class label once all conditions have been evaluated (e.g., "Yes" or "No").
Branches represent the line of decisions chosen or questions asked with their corresponding answers until the leaf node, including alternative decision pathways. The root is the top node beginning the tree. Just like real tree branches, decision-tree analyses may require some pruning from time to time.5
Decision Trees and Machine Learning
In the machine learning world, decision trees are a clear and effective way to visualize complex algorithms. A decision tree can act as an algorithm itself, splitting data into branches based on decision rules. Overall, their popularity in machine learning can be attributed to ease of use and interpretation. This also requires less data preparation, automated handling of missing data points, and learning accessibility.
Unique problems can arise when it comes to decision trees and machine learning. Pruning allows developers to get rid of the chance nodes on the branches of a specific tree.5 This can create a more powerful tree, where branches with less important elements are cut off. This makes the trees simpler, too. Sometimes, a developer may decide to make a large tree to begin with, with the intention of pruning after a tree is ‘fully grown.’ As a result of a large tree, overly specific branches may arise—which then must be addressed with pruning.4
Decision trees are popular in machine learning for tasks like predicting numerical data (regression) or sorting data into categories (classification). For example, regression trees might predict stock prices, while classification trees help identify spam emails. Each type uses a different method to split data: regression uses Mean Squared Error (MSE) for numbers, while classification measures purity, often with Gini impurity.4
A machine is not a genie, it does not work by magic, it does not possess a will, and … nothing comes out which has not been put in, barring, of course, an infrequent case of malfunctioning. … The “intentions” which the machine seems to manifest are the intentions of the human programmer…
— Arthur Samuel, Professor and Computer Scientist
Key Terms
Nodes: The various shapes in a decision tree that indicate the types of decisions being made. Decision trees have decision, chance, and end nodes to categorize the decisions in a given scenario.
Branches: The lines in decision trees that represent how different nodes connect to one another in a series of questions being asked.
Root: The top node of the tree that introduces a given decision or question asked, where all nodes and branches grow out.
Pruning: Cutting back extra branches or nodes in a decision tree to make it simpler and more accurate for future predictions. There is pre-pruning via Chi-square tests for predictive modeling, and post-pruning after the tree is fully developed.
Decision analysis: A discipline to do with the philosophy, methods, and practice to make essential decisions in professional ways.8 The field may include the various ways and tools to assess decisions, including the best course of action. Decision trees are an example of one decision analysis method.
Machine Learning: A branch of computer science that uses data and algorithms, allowing artificial intelligence to imitate and learn intelligent human behavior. It is how a computer may learn an ability without being explicitly programmed to do so.
Regression Tree: A predictive method used in machine learning for data to do with numbers, such as how much your trip to Europe might cost you and your friends.
Classification Tree: A predictive method used in machine learning for data to do with categories, such as comparing which region of Europe you and your friends want to see.
CART: The first published algorithm for decision trees, standing for the Classification and Regression Tree (CART) algorithm. CART continues to be one of the most used algorithms in decision tree data analysis today.
Algorithmic Decision Trees: In machine learning, decision tree algorithms (like CART, C4.5, and ID3) are used to automatically build trees by finding the best ways to split data based on certain criteria. These algorithms optimize the tree structure to achieve the highest predictive accuracy, often by minimizing metrics like entropy, Gini impurity, or variance reduction. They are computational processes used to classify data or make predictions.
Non-Algorithmic Decision Trees: Decision trees can also be hand-crafted decision aids or flowcharts that map out decision-making processes. These are often used in business, medicine, and management to guide human decision-making. They may not involve an algorithm but simply represent a logical sequence of questions leading to outcomes based on expert knowledge or predefined criteria.
History
Before decision trees, the discipline of decision analysis had to develop. This systematic approach to evaluating important decisions for businesses emerged with Ronald A. Howard, who originated the concept in 1964.8
Decision analysis is interdisciplinary, drawing from fields like psychology, business management, and economics when investigating a given decision scenario. Decision trees are just one possible visual representation stemming from decision analysis. These tools help to frame alternative decision pathways, possible uncertainties, and measure if initial goals for a decision are found when final outcomes arise.
The first decision tree emerged in 1963 at the Department of Statistics at the University of Wisconsin. Here, two professors, Morgan and Sonquist, were credited with the decision tree analysis model.9 The development of the decision tree came from the wish to split data into two subsets—or nodes—when analyzing complex social science data.
In developing the Automatic Interaction Detection (AID) algorithm, Morgan and Sonquist were looking into factors that determine social conditions.4 The pair of researchers saw decision trees as a means to analyze survey data, and to find better ways to model it. With many variables of what leads to someone’s social conditions (e.g., age, ethnicity, profession), they were looking for better tools to organize data. For instance, a decision tree could illustrate why a group of homeowners moved into a new area, relative to their unique social conditions.
Decision trees soon after branched to psychology, marked by an early publication of the model in 1966 at the Institute of Computing Science at the Poznań University of Technology.10 Researchers soon realized that decision tree analyses could model human learning—a tree was a great way to depict how students learn concepts in the classroom. The work of Hunt and colleagues brought forth a computer program named ‘CLS,’ or Concept Learning System. As a result of their application from psychology to a computer program to illustrate their findings, decision trees were found to be innovative in the programming world.
With the different ideas surrounding tree types and applications, many felt that there was a need to organize them—this led to the birth of the Classification and Regression Tree Algorithm (CART) in the 1970s.9 The first classification decision tree was seen in 1972 with the THAID project, and then in 1974 when CART emerged. The term ‘CART’ may refer to either type, being a classification or regression, but use the same tree-like diagram. It wasn't until over a decade later in 1984 that the first CART decision tree software was published. This was seen as revolutionary for the world of algorithms and decision tree analysis. The software included how to get rid of unneeded trees, and choosing the best tree version for a given data set and analysis.
People
James Newton Morgan
An American professor of economics at the Institute for Social Research at the University of Michigan between 1949 to 1987.11 Morgan along with Sonquist are often credited for the first decision tree model in the 1960s. He has contributed greatly to the field writing 30 books and contributing to many others . He is the first at University of Michigan to receive the W.S. Woytinsky Lectureship Award in 1977.
John A. Sonquist
An American professor known for his pioneering work in computer sciences and social sciences alike. After spending much of his time directing the Computer Services at the Institute for Social Research at the University of Michigan, he was also influential in the field of sociology.12 With Morgan, he released the first decision tree model in 1963.
Arthur Samuel
An American computer scientist said to be a pioneer in both the computer gaming and artificial intelligence fields. Samuel is credited with the popularization of the term “machine learning” in 1959, and for the first online checkers program on the first IBM commercial computer. Samuel earned his Masters of Electrical Engineering from MIT in 1926.
Ronald A. Howard
An American professor and engineer known for his direction of decision analysis at Stanford, after coining the term in 1966. He was the Director of the Decisions and Ethics Center at Stanford.Howard was an emeritus professor in the Department of Engineer-Economic Systems at Stanford. He earned his ScD in Electrical Engineering from MIT in 1958.
behavior change 101
Start your behavior change journey at the right place
Impacts
Though decision trees have importance in the programming and computer science world, talking about their general use can be confusing—it is key to see how they are applied in tangible contexts. As discussed, decision trees have been applied to topics such as social problems, service industries, and the pharmaceutical industry.4 Two specific contexts we may cite are student satisfaction, and heart disease diagnosis using decision trees.
What Satisfies Students? Trees Might Tell Us!
Every student has had experiences of dissatisfaction in the classroom, whether it be with a bad teacher or a boring class. For keen learners, a great feeling is the “a-ha” moments of satisfaction when learning something new. In 2004, Thomas and Galambos were curious to look into how students’ traits and experiences may impact their satisfaction in school by using a decision tree.13 In their study, the authors applied one specific decision tree algorithm, CHAID, to analyze data on the opinions of students. The authors surveyed nearly 1700 undergraduate students in the Spring of 2000 at an undisclosed public research university.
This survey asked questions about things like satisfaction with campus, self-perceptions of growth, and why they chose their given school. Thomas and Galambos found three main measures to do with student satisfaction: academic experiences in general, social integration and pre-enrollment views, and campus services and facilities. Data mining using tree analyses, in particular, helped break down specific reasons for student satisfaction, one of the key factors being faculty preparedness—if your teacher was ready or not for class. Here, decision trees show us what really satisfies student learning in the classroom and, more broadly, on campus.
Better Heart Disease Diagnoses Using Trees
Decision trees may be helpful with important health decisions such as how to diagnose heart disease. As a global leading cause of death,14 finding ways to streamline diagnoses for diseases such as heart disease may be achieved through data mining techniques like a decision tree. Shouman et al. (2011) investigated a variety of decision trees to improve performance in heart disease diagnosis. The authors compared three types of trees to make these diagnoses: Gain Ratio, Gini Index, and Information Gain.
Shouman and colleagues systematically tested three types of decision trees to identify a robust and accurate model for diagnosing heart disease. Their analysis showed that the Gain Ratio method was the most accurate. Although decision trees can be a powerful tool for health diagnoses, selecting the most accurate model is crucial for disease prognosis—often making a significant difference in patient outcomes.
Controversies
Decision trees may seem easy to look at and understand, but they are far from perfect. Several of the issues with decision trees have to do with just how many trees, and their components such as algorithms, have grown since the 1960s.
Too Many Branches on These Trees
A common problem with decision trees is the number of branches that they end up having: at times, there may be overfitting with smaller data sets.7 Overfitting happens when the model is too similar to the training data, possibly leading to issues with data accuracy and predicting future outcomes in trees.7 When this happens, there are simply too many nodes and branches on the trees—making it too complex for even software to be able to interpret. This has implications for how generalizable a model might be: the tree can assess the current data given but is unable to yield new data that is accurate for models to come.
As mentioned before, just like a real tree, we can prune decision trees. Pruning may solve the issue of having too many nodes where specific parts of the tree are removed. This may allow a researcher or developer to ax parts of the tree that appear to be redundant. A pruned decision tree is thinner, simpler and has less variance, which can lead to more precise decisions being chosen at the ends of trees.
Now that We’ve Grown our Trees, Which Algorithm Do We Pick?
In their overview of passing the 50th anniversary of regression (decision) trees, Loh dove into how trees have been applied to many statistical models, software, and the scientific community at large. Loh invited various authors who have had experience with decision trees to comment. Two researchers, Rusch and Zeileis, noted the abundance of algorithms used in decision tree analyses from statistics, machine learning, and other fields.15 In their digging, the authors found at least 83 unique tree algorithms—suggesting that they are only scratching the surface.
Rusch and Zeileis then highlighted that though this may be good in how tree algorithms are versatile across fields, it’s challenging to keep up with all the growth of these algorithms—let alone picking the “right” algorithm for a given issue. Going back to our heart disease example, the authors only compared three algorithms, but what if they had to compare the 83 found here? Rusch and Zeileis suggest one remedy being a public archive of tree algorithms, including free supplementary materials on the history of a given algorithm.
Case Studies
Mental Health Issues in Young People
We are experiencing a mental health crisis, and it's affecting young people especially. There are a plethora of pressures to face going into adulthood. Decision trees need not only be for the developers and computer programmers of the world, in fact, many mental health and “human” problems may benefit from decision trees, too. When digital solutions are on the rise for predicting mental health problems, what about trees as a part of the toolkit?
Huang et al. (2022) were curious to see if decision trees could be a tool for early detection of college students’ mental health problems.16 The authors had seen in previous research that the decision tree model had successfully done so, using the ID3 algorithm, taking on a preventative lens—perhaps decision tree identification of warning signs for young adults’ mental health problems could lead to early treatment.17
Between the years 2018-2021, over 1300 students at the University of Guilin completed the general psychological health survey designed by the authors. Huang et al. focused namely on questions about direct emotional states from an objective point-of-view, to avoid ambiguity problems with subjective answers. This first survey allowed the authors to screen out attributes that were more likely to be related to psychological crises.
In anticipation of possible psychological crises, the authors constructed a prediction model with eight key attributes for their decision tree, asking questions surrounding school attendance, medication history, campus bullying, and other questions. With this tree and its results, Huang et al. discovered that 261 students out of nearly 2000 participants presented warning signs and possible mental health problems.
China is not the only place that has young people with mental health problems—so do the youth of Canada, who’s mental health status has also been looking into using decision trees. Battisha et al. (2023) thought that a decision tree analysis could help complex relationships when observing population health, such as various constructs of mental health, as opposed to other models (e.g., regression methods).18
From over 74,501 students in 136 schools across Canada, Battisha et al. did the COMPASS study, asking students about their anxiety, depression and psychosocial well-being relative to their sociodemographic and health behavior predictors. What they found, due to the decision trees themselves, may be one of the most pertinent outcomes to do with mental health: the trees were effective in finding youth who were in high-risk subgroups that other methods may have missed.
A lesson decision trees provide us with about young people’s mental health, whether it be in China, Canada, or elsewhere, is just how complex and nuanced these issues are becoming. Decision trees help us with complex problems, and perhaps it is the complexity of how unique mental health issues are today that require tools like these trees to identify these issues long before they occur.
Related TDL Content
Machine Learning & Personalized Interventions: David Halpern
While we have seen decision trees popularity in machine learning, both of these tools have been utilized in ways to help individuals, and organizations. Take a deeper dive into machine learning and how it may be individualized for help in situations of violence, the many faces of nudging, and more in this interview with Behavioral Insights Team Chief Executive David Halpern.
Beyond Access: How can we leverage digital solutions to improve mental health interventions?
Decision trees may have the powerful ability to identify complex issues before crises occur. As trees may be a modern way to identify such problems, the question of modern solutions remains. This piece looks at how well digital mental health interventions (e.g., apps) are working once the people who need them have access to their use.
Sources
- De Ville, B. (2013). Decision trees. WIREs Computational Statistics, 5(6), 448-455. https://doi.org/10.1002/wics.1278
- Decision trees: Complete guide to decision tree analysis. (2023, August 6). Explorium. https://www.explorium.ai/blog/machine-learning/the-complete-guide-to-decision-trees/
- Detwarasiti, A., & Shachter, R. D. (2005). Influence diagrams for team decision analysis. Decision Analysis, 2(4), 207-228. https://doi.org/10.1287/deca.1050.0047
- Kempf-Leonard, K. (2005). Encyclopedia of social measurement. Elsevier.
- DB2 for Linux UNIX and Windows 10.5.0. (n.d.). IBM - United States. https://www.ibm.com/docs/en/db2/10.5?topic=view-pruning-decision-trees
- Pandey, M. and Sharma, V.K. (2013) A Decision Tree Algorithm Pertaining to the Student Performance Analysis and Prediction. International Journal of Computer Applications, 61, 1-5. http://dx.doi.org/10.5120/9985-4822
- Song YY, Lu Y. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry. 2015 Apr 25;27(2):130-5. doi: 10.11919/j.issn.1002-0829.215044.
- Decision analysis (DA): Definition, uses, and examples. (2010, December 15). Investopedia.
- Loh, W. (2014). Fifty years of classification and regression trees. International Statistical Review, 82(3), 329-348. https://doi.org/10.1111/insr.12016
- Hunt E., Marin J., Stone P. Experiments in Induction. New York: Academic Press; 1966.
- James N. Morgan papers, 1939-2010 (majority within 1947-1999). (n.d.). University of Michigan Finding Aids. https://findingaids.lib.umich.edu/catalog/umich-bhl-2014108
- In memoriam: John A. Sonquist, 1931-2017. (2019, March 29). The Santa Barbara Independent. https://www.independent.com/2017/10/26/memoriam-john-sonquist-1931-2017
- Thomas, E. H., & Galambos, N. (2004). What satisfies students? Mining student-opinion data with regression and decision tree analysis. Research in Higher Education, 45(3), 251-269. https://doi.org/10.1023/b:rihe.0000019589.79439.6e
- Shouman, M., Turner, T.L., & Stocker, R. (2011). Using decision tree for diagnosing heart disease patients. Conferences in Research and Practice in Information Technology Series. 121. 23-30.
- Rusch, T. & Zeileis, A. (2013). Gaining insight with recursive partitioning of generalized linear models. J. Stat. Comput. Sim., 83(7), 1301–1315.
- Huang, Y., Li, S., Lin, B., Ma, S., Guo, J., & Wang, C. (2022). Early detection of college students' psychological problems based on decision tree model. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.946998
- Zhang, F. H., Fang, L. T., and Gao, P. (2008). Research on psychological crisis and its intervention. World Sci. Technol. Res. Dev. 30, 504–508. doi: 10.16507/j.issn.1006-6055.2008.04.026
- Battista K, Diao L, Patte KA, Dubin JA, Leatherdale ST. Examining the use of decision trees in population health surveillance research: an application to youth mental health survey data in the COMPASS study. Health Promot Chronic Dis Prev Can. 2023;43(2):73-86. https://doi.org/10.24095/hpcdp.43.2.03
About the Author
Isaac Koenig-Workman
Isaac Koenig-Workman has several years of experience in roles to do with mental health support, group facilitation, and public speaking in a variety of government, nonprofit, and academic settings. He holds a Bachelor of Arts in Psychology from the University of British Columbia. Isaac has done a variety of research projects at the Attentional Neuroscience Lab and Centre for Gambling Research (CGR) with UBC's Psychology department, as well as contributions to the PolarUs App for bipolar disorder with UBC's Psychiatry department. In addition to writing for TDL he is currently a Justice Interviewer for the Family Justice Services Division of B.C. Public Service, where he determines client needs and provides options for legal action for families going through separation, divorce and other family law matters across the province.