Linear Discriminant Analysis (LDA)
What is Linear Discriminant Analysis?
Linear discriminant analysis (LDA), also known as normal discriminant analysis (NDA) or discriminant function analysis (DFA), is a powerful dimensionality reduction technique widely used in machine learning and statistics. LDA enhances classification accuracy by identifying the optimal linear combinations of features that separate different classes within a dataset. By reducing complexity while preserving critical class distinctions, LDA improves model performance in applications such as pattern recognition, face recognition, and text classification.
The Basic Idea
After many years of running the show at the family restaurant, your pizzeria has become a local favorite. Like any restaurateur, you're always trying to find new customers. Though (almost) everyone likes pizza, you want to definitively answer the question: “Which type of customer is most likely to eat my pizzas?” With participation from both regulars and first-time pizza eaters, you begin asking customers to complete a simple survey about themselves. With hundreds of customer data features collected, you are in search of the right analysis to discover what type of person truly wants to buy your pizza. In this case scenario, a linear discriminant analysis may be the technique for you.
Linear discriminant analysis (LDA), also referred to as normal discriminant analysis (NDA) or discriminant function analysis (DFA), is a popular technique in machine learning and statistics used to reduce the number of dimensions in a dataset while maintaining the ability to distinguish between different classes.1 The main goal of LDA is to find the linear combinations of features that best separate two or more classes in the data. Unlike other dimensionality reduction methods like principal component analysis (PCA), which focuses on maximizing variance,1 LDA aims to maximize how well classes can be separated in a linear fashion.
LDA is a generative model, meaning it estimates how data is distributed for each class. Using Bayes' theorem, it calculates the probability that a new data point belongs to each class and assigns it accordingly.2 With the help of Bayes, LDA calculates condition probabilities for a dataset—the chance that something will happen when another event has happened. If we were to use LDA algorithms at our pizzeria, the application of Bayes’ theorem would help us check if our assumption of likely customer types is accurate or not.
In practice, linear discriminant analysis helps to find a linear grouping of characteristics that isolates at least two types of objects or events. As a dimensionality reduction method, LDA simplifies complex datasets by transforming data with multiple features (dimensions) into a lower-dimensional space. It does this while preserving the ability to distinguish between different classes, making classification more efficient and reducing computational complexity. Since LDA is adept at reducing dimensions, it is a tool that can be applied to multi-class data classification problems, in contrast to other data analyses like logistic regression, which only works on binary classifications.2 Due to its versatile nature, it is common to use LDA as a means to improve the abilities of other classification algorithms, such as decision trees.
LDA vs. PCA
It can be a bit tricky to understand how LDA is distinct from a similar approach called principal component analysis (PCA), so let’s continue with the pizza example in our chart to make sense of these analyses:3
Eigenvectors and Eigenvalues
The goal of LDA, especially when we interpret it as a technique for reducing dimensions, is to separate data in a straight line. When we consider the math of linear functions, this is accomplished using eigenvectors and eigenvalues.2 To understand this, let’s return to our pizza example. Once you’ve collected data on your customers, it's not as simple as picking pepperoni versus margherita: the data is all over the place, and you need a scatterplot to sort it out. Eigenvectors give us the direction of this scatterplot, indicating the direction in which the data is best separated, whereas eigenvalues highlight how important this directional data is. A high eigenvalue explains how a relative eigenvalue is more crucial.
When we conduct an LDA, the eigenvector calculations are based on the data found and collected from two different scatter-matrices:2
- Within-class scatter matrix: Represents how spread out, or varied, the data points are within each class. LDA tries to minimize this scatter to ensure that points within the same class remain close together.
- Between-class scatter matrix: Represents the variation between different class means. LDA tries to maximize this scatter to push class means farther apart, improving separation.
How do you prepare to conduct an LDA?
The raw survey data won’t give us the answers we are looking for. If we really want to figure out who is buying our pizza, we need to sort the data first. Here are some best practices to follow before conducting a linear discriminant analysis:2
- Data preprocessing to make sure it's normal and centered: LDA assumes that the data follows a normal distribution, and mean-centering the data helps compute scatter matrices correctly.
- Picking the right amount of dimensions for the lower dimensional space: Choosing the number of discriminants is based on keeping the most informative eigenvalues or testing performance in lower dimensions. We will come back to this “lower” dimensional space when we actually do the LDA.
- Regularize the model chosen: This can help avoid overfitting, which is when a statistical model fits precisely with its training data yet leads to issues with accuracy by failing to accommodate additional data or make reliable predictions.
- Apply cross-validation to assess how well the model is working: One way to assess classifiers is by using a confusion matrix, which checks to see if a classifier is getting confused about the classes—or naming one incorrectly as another! We know not everyone likes Hawaiian pizza.
How does LDA actually work?
When we apply an LDA, the data projects onto a lower-dimensional space, maximizing the amount of separation between classes. This happens when LDA identifies a set of linear discriminants that can maximize the spread of between-class variance relative to within-class variance.1 A simpler way to understand this is that LDA discovers the directions that best separate various data classes.
There are three key steps of linear discriminant analysis from a computational perspective:2
- Find the between-class variance: How separate the classes are, also known as the distance between the class means.
- Find the within-class variance: The distance between class means and individual samples.
- Place the data into a lower-dimensional space: The fewer dimensions, the more between-class variance is maximized, and within-class variance is minimized.
What contexts is LDA used in?
Linear discriminant analysis (LDA) is widely used in various fields due to its ability to simplify complex datasets while preserving class separability. Common applications include facial recognition, where LDA helps identify individuals by distinguishing facial features, and medical diagnostics, where it is used to classify disease states based on patient data. LDA is also employed in marketing to segment customers based on purchasing behavior and in finance for credit scoring, helping to predict whether individuals are likely to default on loans. Its versatility in classification tasks makes it an essential tool in many industries.
“To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”
— Ronald A. Fisher, statistician and creator of linear discriminant analysis
About the Author
Isaac Koenig-Workman
Isaac Koenig-Workman has several years of experience in roles to do with mental health support, group facilitation, and public speaking in a variety of government, nonprofit, and academic settings. He holds a Bachelor of Arts in Psychology from the University of British Columbia. Isaac has done a variety of research projects at the Attentional Neuroscience Lab and Centre for Gambling Research (CGR) with UBC's Psychology department, as well as contributions to the PolarUs App for bipolar disorder with UBC's Psychiatry department. In addition to writing for TDL he is currently a Justice Interviewer for the Family Justice Services Division of B.C. Public Service, where he determines client needs and provides options for legal action for families going through separation, divorce and other family law matters across the province.