Behavioral Science, Applied. - The Decision Lab

The Basic Idea

You likely own an electronic device equipped with speech recognition technology, for example, your smartphone or smart home device. At first glance, the process behind speech recognition might seem straightforward: the electronic device recognizes the sounds you're making, translates them into written text, and receives them as commands. However, the actual process involves sophisticated tech and detailed processes¹:

Signal Processing: The first step in speech recognition involves converting spoken words into a digital format that a computer can understand. When you speak, a microphone captures your voice as an acoustic signal. This signal is transformed into a digital representation the computer can process. This step lays the foundation for the next stages.

Speech Feature Extraction: You can think of this step as the sieve of speech recognition. The main goal is to filter out the background noise and irrelevant information, only keeping the critical data for the next stages. The computer recognizes features in your voice such as pitch, tone and rhythm to make an accurate extraction.

Acoustic & Language Models: Extracted features are then recognized and processed by acoustic and language models. The acoustic model maps these features to phonemes (the basic units of sound in a language), while the language model adds context by understanding how words typically fit together in a language. These models use predictions based on previous words to guess and confirm the next words, ensuring coherence and increasing accuracy.

Decoding: The final stage involves combining the outputs of acoustic and language models to generate the most probable transcription of what was spoken. This involves searching through possible combinations of words to find the sequence that best matches the input.

It’s also important to differentiate between speech recognition and voice recognition. Speech recognition technology focuses on what is being said, without considering who is saying it. On the other hand, voice recognition is concerned with identifying who is speaking, rather than the content of what they are saying.²

For example, Amazon’s Alexa is equipped with both technologies. Imagine your home's security alarm is set up with Alexa, but you also have children, and you need to ensure that the alarm isn’t activated accidentally by one of them. Through speech recognition, Alexa understands the command to activate the alarm, but through voice recognition, it can differentiate between your voice and your child’s, preventing unauthorized activation or deactivation.

Now, you might be asking: but how does it actually “understand” humans? This is where artificial intelligence, and more specifically deep learning comes in. Although other options are possible such as Gaussian mixture models (GMM) or hidden Markov models (HMM), most software and devices use deep learning.

This type of AI learns from “experience,” or vast amounts of data. Specifically, Alexa is really good at interpreting humans because it has been “fed” millions of commands and human voices. Through speech recognition and deep learning, technology is able to understand and interact with us making it significantly more intuitive and responsive than ever.³

About the Author

Mariana Ontañón

Mariana holds a BSc in Pharmaceutical Biological Chemistry and a MSc in Women’s Health. She’s passionate about understanding human behavior in a hollistic way. Mariana combines her knowledge of health sciences with a keen interest in how societal factors influence individual behaviors. Her writing bridges the gap between intricate scientific information and everyday understanding, aiming to foster informed decisions.

Consulting

Industries

Resources

Speech Recognition

What is Speech Recognition?

The Basic Idea

Case studies

From Insight to Impact: Our Success Stories

Is there a problem we can help with?

About the Author

Mariana Ontañón

About us

We are the leading applied research & innovation consultancy

Our insights are leveraged by the most ambitious organizations

OUR CLIENT SUCCESS

Annual Revenue Increase

Increase in Monthly Users

Reduction In Design Time

Reduction in Client Drop-Off

Read Next

Perceptual Set

Curse of Knowledge

Storyboard

Risk Analysis

Eager to learn about how behavioral science can help your organization?

Consulting

Industries

Resources

Speech Recognition

What is Speech Recognition?

The Basic Idea

Case studies

From Insight to Impact: Our Success Stories

Is there a problem we can help with?

About the Author

Mariana Ontañón

About us

We are the leading applied research & innovation consultancy

Our insights are leveraged by the most ambitious organizations

OUR CLIENT SUCCESS

Annual Revenue Increase

Increase in Monthly Users

Reduction In Design Time

Reduction in Client Drop-Off

Read Next

Perceptual Set

Curse of Knowledge

Storyboard

Risk Analysis

Eager to learn about how behavioral science can help your organization?

Get new behavioral science insights in your inbox every month.