Linear Discriminant Analysis (LDA)

What is Linear Discriminant Analysis?

Linear discriminant analysis (LDA), also known as normal discriminant analysis (NDA) or discriminant function analysis (DFA), is a powerful dimensionality reduction technique widely used in machine learning and statistics. LDA enhances classification accuracy by identifying the optimal linear combinations of features that separate different classes within a dataset. By reducing complexity while preserving critical class distinctions, LDA improves model performance in applications such as pattern recognition, face recognition, and text classification.

Two line graphs with circles and triangles, with the graph on the right with a diagonal line, describing LDA.

The Basic Idea

After many years of running the show at the family restaurant, your pizzeria has become a local favorite. Like any restaurateur, you're always trying to find new customers. Though (almost) everyone likes pizza, you want to definitively answer the question: “Which type of customer is most likely to eat my pizzas?” With participation from both regulars and first-time pizza eaters, you begin asking customers to complete a simple survey about themselves. With hundreds of customer data features collected, you are in search of the right analysis to discover what type of person truly wants to buy your pizza. In this case scenario, a linear discriminant analysis may be the technique for you.

Linear discriminant analysis (LDA), also referred to as normal discriminant analysis (NDA) or discriminant function analysis (DFA), is a popular technique in machine learning and statistics used to reduce the number of dimensions in a dataset while maintaining the ability to distinguish between different classes.1 The main goal of LDA is to find the linear combinations of features that best separate two or more classes in the data. Unlike other dimensionality reduction methods like principal component analysis (PCA), which focuses on maximizing variance,1 LDA aims to maximize how well classes can be separated in a linear fashion. 

LDA is a generative model, meaning it estimates how data is distributed for each class. Using Bayes' theorem, it calculates the probability that a new data point belongs to each class and assigns it accordingly.2 With the help of Bayes, LDA calculates condition probabilities for a dataset—the chance that something will happen when another event has happened. If we were to use LDA algorithms at our pizzeria, the application of Bayes’ theorem would help us check if our assumption of likely customer types is accurate or not. 

In practice, linear discriminant analysis helps to find a linear grouping of characteristics that isolates at least two types of objects or events. As a dimensionality reduction method, LDA simplifies complex datasets by transforming data with multiple features (dimensions) into a lower-dimensional space. It does this while preserving the ability to distinguish between different classes, making classification more efficient and reducing computational complexity. Since LDA is adept at reducing dimensions, it is a tool that can be applied to multi-class data classification problems, in contrast to other data analyses like logistic regression, which only works on binary classifications.2 Due to its versatile nature, it is common to use LDA as a means to improve the abilities of other classification algorithms, such as decision trees. 

LDA vs. PCA

It can be a bit tricky to understand how LDA is distinct from a similar approach called principal component analysis (PCA), so let’s continue with the pizza example in our chart to make sense of these analyses:3

Eigenvectors and Eigenvalues

The goal of LDA, especially when we interpret it as a technique for reducing dimensions, is to separate data in a straight line. When we consider the math of linear functions, this is accomplished using eigenvectors and eigenvalues.2 To understand this, let’s return to our pizza example. Once you’ve collected data on your customers, it's not as simple as picking pepperoni versus margherita: the data is all over the place, and you need a scatterplot to sort it out. Eigenvectors give us the direction of this scatterplot, indicating the direction in which the data is best separated, whereas eigenvalues highlight how important this directional data is. A high eigenvalue explains how a relative eigenvalue is more crucial.

When we conduct an LDA, the eigenvector calculations are based on the data found and collected from two different scatter-matrices:2

  1. Within-class scatter matrix: Represents how spread out, or varied, the data points are within each class. LDA tries to minimize this scatter to ensure that points within the same class remain close together. 
  2. Between-class scatter matrix: Represents the variation between different class means. LDA tries to maximize this scatter to push class means farther apart, improving separation.

How do you prepare to conduct an LDA?

The raw survey data won’t give us the answers we are looking for. If we really want to figure out who is buying our pizza, we need to sort the data first. Here are some best practices to follow before conducting a linear discriminant analysis:2

  1. Data preprocessing to make sure it's normal and centered: LDA assumes that the data follows a normal distribution, and mean-centering the data helps compute scatter matrices correctly.
  2. Picking the right amount of dimensions for the lower dimensional space: Choosing the number of discriminants is based on keeping the most informative eigenvalues or testing performance in lower dimensions. We will come back to this “lower” dimensional space when we actually do the LDA.
  3. Regularize the model chosen: This can help avoid overfitting, which is when a statistical model fits precisely with its training data yet leads to issues with accuracy by failing to accommodate additional data or make reliable predictions.
  4. Apply cross-validation to assess how well the model is working: One way to assess classifiers is by using a confusion matrix, which checks to see if a classifier is getting confused about the classes—or naming one incorrectly as another! We know not everyone likes Hawaiian pizza. 

How does LDA actually work?

When we apply an LDA, the data projects onto a lower-dimensional space, maximizing the amount of separation between classes. This happens when LDA identifies a set of linear discriminants that can maximize the spread of between-class variance relative to within-class variance.1 A simpler way to understand this is that LDA discovers the directions that best separate various data classes.

There are three key steps of linear discriminant analysis from a computational perspective:2

  1. Find the between-class variance: How separate the classes are, also known as the distance between the class means.
  2. Find the within-class variance: The distance between class means and individual samples. 
  3. Place the data into a lower-dimensional space: The fewer dimensions, the more between-class variance is maximized, and within-class variance is minimized. 

What contexts is LDA used in?

Linear discriminant analysis (LDA) is widely used in various fields due to its ability to simplify complex datasets while preserving class separability. Common applications include facial recognition, where LDA helps identify individuals by distinguishing facial features, and medical diagnostics, where it is used to classify disease states based on patient data. LDA is also employed in marketing to segment customers based on purchasing behavior and in finance for credit scoring, helping to predict whether individuals are likely to default on loans. Its versatility in classification tasks makes it an essential tool in many industries.

“To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”


— Ronald A. Fisher, statistician and creator of linear discriminant analysis

Key Terms

Dimensionality Reduction: The process of reducing the number of features in a dataset while maintaining its essential information.1 LDA achieves this by finding linear combinations of features that best separate the data into different classes.

Principal Component Analysis (PCA): A statistical technique that can simplify complex datasets by reducing their dimensions while retaining critical information.3 By identifying the key patterns or “principal components” that explain the maximum variance in the data, PCA helps uncover underlying structures and streamline decision-making processes, making it especially valuable for behavioral insights.

Class Separability: The ability to distinguish between different classes in a dataset. LDA seeks to maximize this separability by projecting data onto a lower-dimensional space where the distance between class means is as large as possible.

Bayes’ Theorem: A theory that underpins the framework of linear discriminant analysis (LDA), enabling it to classify data points by calculating the likelihood of a given observation belonging to each class. By combining prior probabilities and the evidence from feature distributions, LDA uses Bayes' theorem to find the decision boundaries that best separate groups within the data.

Logistic Regression:  A statistical method that can predict the probability of a binary outcome, such as whether a user will take a specific action or not, based on input features. By modeling the relationship between these features and the likelihood of an outcome through a sigmoid function, it provides actionable insights for decision-making in behavioral science and predictive analytics.

Eigenvalues and Eigenvectors: Essential elements of LDA used to identify the directions (eigenvectors) that maximize the class separability.2 The corresponding eigenvalues determine the importance of each direction in the projection.

Discriminant Function: A mathematical formula that classifies data points into distinct groups by maximizing the separation between categories. In behavioral science, it helps identify patterns in complex data, enabling researchers to predict group membership, such as classifying user behaviors or decision-making styles, based on measurable features.

Fisher Linear Discriminant (FLD):  The foundation of linear discriminant analysis, focused on finding the optimal projection that maximizes the separation between classes of data. By choosing the axis that preserves the most meaningful differences while minimizing overlap, FLD enables LDA to classify behavioral data with precision and uncover unique decision-making patterns.

Multiple Discriminant Analysis: An extension of LDA, used when classifying data into more than two groups. By identifying multiple discriminant functions, it enables behavioral scientists to analyze complex decision-making patterns across multiple categories.

Deep Learning: A subset of machine learning that uses artificial neural networks to model complex patterns and relationships in large datasets. By mimicking the structure of the human brain, it enables behavioral scientists to analyze nuanced decision-making processes and predict outcomes with high accuracy.

Support Vector Machines (SVMs): A powerful classification technique that finds the optimal hyperplane to separate different classes in a dataset. SVMs focus on maximizing the margin between classes, regardless of whether the data is linearly separable. Unlike LDA, which assumes Gaussian (normal) distributions and equal variance across classes, SVMs are more flexible and can handle complex decision boundaries, making them useful for analyzing non-linear relationships in behavioral data.

Neural Networks: A class of machine learning models inspired by the brain’s structure, designed to recognize intricate patterns through layers of interconnected nodes. While LDA assumes linear decision boundaries, neural networks can model highly non-linear relationships—making them ideal for analyzing complex behavioral patterns and predicting outcomes in dynamic decision-making environments.

History

The foundations of linear discriminant analysis (LDA) date back to the 1930s, when the concept of discriminant functions began to emerge in statistics. At the time, British biologist and statistician Ronald Fisher was laying down the groundwork for LDA in his 1936 paper “The Use of Multiple Measurements in Taxonomic Problems.”4 While finding ways to sort taxonomies of plant species, Fisher introduced the Fisher linear discriminant (FLD) as a means to discover linear combinations of features that best separated different classes. 

While studying iris flowers, Fisher hoped to ascertain a simple and effective method that reduced dimensionality while preserving class separability. Fisher’s initial work involved separating only two classes of flowers—unlike LDA today, where the analysis can consider several classes at a time. This advancement in LDA began in the mid-1940s when the mathematician C.R. Rao began tackling multi-class problems. As Rao looked into discriminant analysis problems, he became curious if he could eliminate some variables of a problem while still maintaining the appropriate information to discriminate the classes.5 Upon the discovery that this was indeed possible, Rao introduced the multi-class version of LDA, called multiple discriminant analysis.6

During the 1960s and 1970s, linear discriminant analysis gained traction as statistical computing technology and methods advanced—including the richness of datasets and the diversity of fields it was applied to. Researchers further formalized Fisher's method, improving the mathematical framework for high-dimensional datasets. The 1980s saw an expansion of LDA’s application to a variety of fields, such as speech and pattern recognition, which featured large, complex datasets that could greatly benefit from efficient classification techniques. This period also marked the development of computational algorithms that allowed LDA to handle multi-class problems beyond Fisher's original binary classification approach.

By the 1990s, LDA became a cornerstone in machine learning and data analysis, especially in fields like bioinformatics, finance, and computer vision. Researchers refined LDA algorithms to improve robustness and performance with non-normal data distributions and higher-dimensional spaces. With the rise of big data and artificial intelligence in recent years, LDA plays a key part in dimensionality reduction, feature extraction, and classification tasks, evolving alongside more sophisticated machine learning models like deep learning. Today, LDA is popular in fields like healthcare for disease classification, finance for credit risk prediction, and marketing for customer segmentation. 

In the future, linear discriminant analysis will continue to complement more complex models by offering simplicity and interpretability. While it may not always outperform sophisticated methods like deep learning, LDA’s ability to reduce dimensionality and define decision boundaries will remain valuable, particularly in high-dimensional, sparse, or noisy data scenarios. As datasets grow larger and more complex as we dive further into the era of big data, LDA could be further refined, potentially to be used in combination with other methods such as support vector machines or neural networks to combine the strengths of simplicity and predictive power.

People

Ronald A. Fisher

A British biologist and statistician who is the founding figure behind linear discriminant analysis. Fisher is also known for introducing the Fisher linear discriminant (FLD) in 1936, a technique for finding linear combinations of features for class separation that laid the foundation for modern LDA.

C. R. Rao

An Indian statistician who made significant contributions to the statistical theory of multivariate analysis, such as advancements in LDA.5 Rao’s work on the generalization of Fisher's method expanded LDA's applicability to more complex datasets and higher dimensions.

Herman Wold

A Swedish economist and statistician, Wold advanced multivariate analysis techniques, influencing the development of LDA for practical use in fields like economics and social sciences. His work contributed to the refinement of algorithms for improving LDA's robustness in real-world applications.

behavior change 101

Start your behavior change journey at the right place

Impacts

As a powerful tool for fields like machine learning and data analysis, linear discriminant analysis helps us classify data in simpler ways overall. There are a few key benefits to the method, including reducing dimensions, being more accurate with classification, and versatility in where LDA can be applied.

Fewer Dimensions in Machine Learning

LDA has had a significant impact on dimensionality reduction by enabling the transformation of high-dimensional data into a lower-dimensional space while maintaining class separability. This makes it easier to visualize and analyze complex datasets, improving the performance of machine learning models.7

With fewer dimensions, there is a clearer differentiation between the classes in a dataset. Machine learning models can then filter out what data we need vs. what we don’t in a more efficient way. When we have a dataset that feels impossible to digest, dimension reduction can go a long way, allowing us to look at our data without all the noise.

Better Classification Accuracy

Linear discriminant analysis can discover what features are the most discriminative in a given dataset, which results in more accurate classifications. As LDA can maximize how separate classes are, there is less risk that data is misclassified, leading to better predictive power.3

By creating clear decision boundaries between classes, LDA has enhanced the accuracy of classification tasks across various fields, such as healthcare, finance, and marketing. Its ability to maximize class separability makes it a reliable tool for improving prediction outcomes in these industries.

Using LDA Across Fields 

Linear discriminant analysis is known to be a versatile method with strengths in classification and dimension reduction. Some common applications may be in fields where classifying and reducing dimensions are necessary for data analysis and decision-making, including but not limited to facial recognition, medical diagnosis, biometrics, marketing and customer segmentation, and pattern recognition.

Let’s briefly compare how LDA could be used in two of these fields to appreciate its versatility. For instance, LDA can improve facial recognition algorithms by reducing the number of dimensions in facial images while keeping the information needed to distinguish from person to person. In another setting, LDA may be able to sort out if a patient is healthy or at risk for harm from a certain disease based on their medical history and features. 

Controversies

No analysis is without its flaws, and linear discriminant analysis is no exception. There are some key drawbacks to LDA, such as its assumption that data follows a normal curve, its sensitivity to outliers, and its limits when dealing with linear problems. 

Assuming the Normal Distribution is True

Linear discriminant analysis assumes that data from each class follows a Gaussian (normal) distribution, which may not hold true when performing statistical analysis on many real-world datasets. When the data significantly deviates from normality, LDA’s performance can degrade, leading to inaccurate classifications. The suitability of LDA depends on how the data is distributed in the first place; if the underlying distribution significantly deviates from normality, alternative methods—such as non-parametric approaches—may be more appropriate.

The world is complex, and LDA may be applied to many complex problems. Despite its known range of application, this can be a challenge when non-normal data distributions like gamma, binomial, or exponential may occur. These non-normal distributions are commonly used in health, education, and social sciences, which underscores the importance of having other non-normal, non-LDA methods in our toolkit.8

Those Sensitive Outliers 

Linear discriminant analysis is highly sensitive to outliers, which can distort the scatter matrices and reduce its ability to separate classes effectively. In datasets with significant outliers, LDA might misclassify data points or fail to find optimal decision boundaries. Outliers can be tough in general, as they are stand-out pieces of data that drag us away from a clean, linear, normal distribution.

There is still hope for staying normal—in our distribution, that is. LDA may not be as sensitive to outliers compared to similar tools that rely on singular data points,7 and there are a few ways we can address outliers. One way is to simply remove them if they resulted from problems with data collection or measurement, while another solution is to choose a more robust statistical analysis that is less sensitive to extreme data points, like a Winzoration method. 

Limits of Being Linear 

LDA can only create linear decision boundaries between classes, making it less effective when the relationship between features and classes is non-linear. In such cases, more complex methods like kernel-based approaches or deep learning may outperform LDA. Just as not all data is normal, not all data is linear. 

This becomes an issue when applying LDA to a non-linear dataset, as there may be inaccuracies in identifying patterns. The problem of LDA not being capable of dealing with non-linear relationships feeds into a deeper issue of not being able to address complex patterns and boundaries. With more nuanced, sporadic data, we may prefer support vector machines or neural networks.7

Case Studies

LDA for Breast Cancer Diagnosis

Getting a medical diagnosis right can be challenging, which may be especially true when the tools we use to identify them are not easily available. Breast cancer is a disease that has the highest public malignancy for women across the world, yet mammography to analyze breast cancer is not consistently performed in all general hospitals.9 The time between discovering a disease such as breast cancer and its treatment is a critical period for a lower mortality rate—something that can be sped up with an approach based on linear discriminant analysis.

Adebiyi and colleagues looked into how a machine learning LDA model may improve the diagnostic accuracy for cancer diagnosis for a dataset based in Wisconsin, United States. With a sample of over 500 instances of breast cancer cases, the authors found that LDA—when applied in combination with classifiers, random forests, and support vector machines—yields highly accurate diagnostic results of about 96%. It was the use of LDA feature extraction in particular that is credited for improving breast cancer prediction models.

These types of findings are substantial, as a technique like LDA can help identify when something like breast cancer is malignant or not. If we can diagnose and treat cancers earlier on with proper identification, thousands of lives can be saved. It is notable that LDA is not only limited to diagnosing cancers but may also be useful in other diagnostic procedures—perhaps even with mental illnesses and their many nuances. In the future, it will be interesting to see how machine learning methods using LDA may speed up diagnostics and promote greater accuracy prior to harm occurring.

LDA for Facial Recognition 

No two faces are exactly the same, even if you have some friends who are twins. Nowadays, facial recognition can be used to sign into our phones, open gates to get onto public transit, or even purchase items at a vending machine. In a facial recognition system, linear discriminant analysis can be used to classify faces based on features extracted from images, such as the distance between the eyes, nose shape, and jawline. Here, high-dimensional pixel data can be put into a lower-dimensional space to maximize the separability between different faces.

In the late 1990s, Belhumeur and colleagues did just that: a Fisherfaces method was introduced to combine LDA with eigenfaces, another technique that projects images in linear means, to improve facial recognition.10 The authors made use of LDA to create an algorithm that applies a pattern classification approach. In other words, each individual pixel in a picture of a face can be used as a point in a high-dimensional space. In developing their system, the authors found that combining LDA with the existing eigenface technique could help accurately identify faces even when lighting conditions and facial expressions were highly variable. 

In this case study, researchers projected the data into a reduced space to maximize the variance between individuals' faces while minimizing the variance within each individual’s face. This type of application of LDA in facial recognition may be used for security systems, law enforcement, and digital media, illustrating its power in handling high-dimensional image data and making accurate, real-time identifications. Whether it's catching the FBI’s most-wanted face or signing you into your newest tech, LDA can help us recognize faces with greater accuracy. 

Related TDL Content

Decision Tree Analysis 

Sometimes, an LDA may not always fit the problem. In this piece, TDL columnist Isaac Koenig-Workman breaks down a decision tree analysis as a means for effective decision-making. Read more about its varied applications, including an interesting case study on young people’s mental health. 

Sensitivity Analysis

LDA is limited in its ability to handle outliers, so at times, a more sensitive analysis may be suitable. In this article, TDL columnist Emilie Rose Jones explains what a sensitivity analysis is and how it explores “what-if” scenarios to identify the factors that have bigger impacts than others in a company’s decision-making.

Sources

  1. Linear discriminant analysis in machine learning. (2024, March 20). GeeksforGeeks. https://www.geeksforgeeks.org/ml-linear-discriminant-analysis/
  2. What is linear discriminant analysis? (n.d.). IBM - United States. https://www.ibm.com/think/topics/linear-discriminant-analysis
  3. Ultimate guide to linear discriminant analysis (LDA). (2023, December 6). Dataaspirant - A Data Science Portal For Beginners. https://dataaspirant.com/linear-discriminant-analysis/
  4. Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Cambridge University Press, 7, 179-188. https://digital.library.adelaide.edu.au/server/api/core/bitstreams/1801cd68-028a-4380-a9c6-30ca9b0aa0d3/content
  5. Fujikoshi, Y. (2021). Contributions to multivariate analysis due to C. R. Rao and associated developments. Contributions to Statistics, 239-257. https://doi.org/10.1007/978-3-030-83670-2_11
  6. Mehta, A. (2022, April 28). Everything you need to know about linear discriminant analysis. Digital Vidya. https://www.digitalvidya.com/blog/linear-discriminant-analysis/#:~:text=Linear%20Discriminant%20Analysis%20was%20developed,apply%20to%20multi%2Dclass%20problems
  7. Ambika. (2023, September 13). Linear discriminant analysis (LDA) in machine learning: Example, concept and applications. Medium. https://medium.com/aimonks/linear-discriminant-analysis-lda-in-machine-learning-example-concept-and-applications-37f27e7c7e98
  8. Bono, R., Blanca, M. J., Arnau, J., & Gómez-Benito, J. (2017). Non-normal distributions commonly used in health, education, and social sciences: A systematic review. Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.01602
  9. Adebiyi, M. O., Arowolo, M. O., Mshelia, M. D., & Olugbara, O. O. (2022). A linear discriminant analysis and classification model for breast cancer diagnosis. Applied Sciences, 12(22), 11455. https://doi.org/10.3390/app122211455
  10. Belhumeur, P., Hespanha, J., & Kriegman, D. (1997). Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711-720. https://doi.org/10.1109/34.598228

About the Author

A smiling man with light hair and a beard is wearing a denim jacket over a light turtleneck. He is standing in a nighttime setting, with warm lights glowing in the background, including a large, glowing yellow sphere. He has a black strap across his chest, possibly from a bag, and the environment around him suggests an outdoor, urban atmosphere.

Isaac Koenig-Workman

Justice Interviewer @ Family Justice Services Division of B.C. Public Service

Isaac Koenig-Workman has several years of experience in roles to do with mental health support, group facilitation, and public speaking in a variety of government, nonprofit, and academic settings. He holds a Bachelor of Arts in Psychology from the University of British Columbia. Isaac has done a variety of research projects at the Attentional Neuroscience Lab and Centre for Gambling Research (CGR) with UBC's Psychology department, as well as contributions to the PolarUs App for bipolar disorder with UBC's Psychiatry department. In addition to writing for TDL he is currently a Justice Interviewer for the Family Justice Services Division of B.C. Public Service, where he determines client needs and provides options for legal action for families going through separation, divorce and other family law matters across the province.

About us

We are the leading applied research & innovation consultancy

Our insights are leveraged by the most ambitious organizations

Image

I was blown away with their application and translation of behavioral science into practice. They took a very complex ecosystem and created a series of interventions using an innovative mix of the latest research and creative client co-creation. I was so impressed at the final product they created, which was hugely comprehensive despite the large scope of the client being of the world's most far-reaching and best known consumer brands. I'm excited to see what we can create together in the future.

Heather McKee

BEHAVIORAL SCIENTIST

GLOBAL COFFEEHOUSE CHAIN PROJECT

OUR CLIENT SUCCESS

$0M

Annual Revenue Increase

By launching a behavioral science practice at the core of the organization, we helped one of the largest insurers in North America realize $30M increase in annual revenue.

0%

Increase in Monthly Users

By redesigning North America's first national digital platform for mental health, we achieved a 52% lift in monthly users and an 83% improvement on clinical assessment.

0%

Reduction In Design Time

By designing a new process and getting buy-in from the C-Suite team, we helped one of the largest smartphone manufacturers in the world reduce software design time by 75%.

0%

Reduction in Client Drop-Off

By implementing targeted nudges based on proactive interventions, we reduced drop-off rates for 450,000 clients belonging to USA's oldest debt consolidation organizations by 46%

Read Next

Notes illustration

Eager to learn about how behavioral science can help your organization?