Think Aloud Protocol

The Basic Idea

Have you ever found yourself deeply engrossed in a complex task and then suddenly realized you were talking aloud to yourself? 

What you were doing is similar to something called the ‘think aloud protocol’ (TAP), also known as “concurrent verbalization.”

When used procedurally TAP is like eavesdropping on someone’s inner thoughts. Here’s how it works. You ask an individual to engage in a task, such as navigating a digital interface, using a search engine, or completing a reading activity. While they work through the task, you also ask them to narrate what they are thinking moment by moment: in other words, to think aloud. 

The main idea behind TAP is that verbalizing one’s thoughts can reveal the cognitive processes behind user behavior. This method has been widely used in fields such as usability testing, human-computer interaction, and education to understand what people are thinking, doing, and feeling when interacting with a product or completing an activity. 

Jakob Nielsen, usability pioneer and author of Usability Engineering, believes that TAP is the single most important tool in the user experience (UX) toolbox.1 In usability testing, participants’ thoughts are used to identify potential problems with the design and find ways to improve UX. TAP can be applied to the testing of any type of product or service, such as websites, apps, games, devices, or physical objects. Any aspect of the design that causes confusion on the part of the user, such as poor language and labelling, unclear help or instructional text, inappropriate use of color or visual design, can all be revealed through TAP. 

There are two different types of TAP: concurrent think aloud (CTA) and retrospective think aloud (RTA). CTA is the most common approach, where think aloud is done while the participant is completing the task. In RTA, participants remain silent while they interact with the product being tested. Once the task is complete, they are shown a ‘reminder’ of what they did (such as a video recording of the participant, or replay of screen activity or eye tracking) and are asked to describe what they were thinking or doing at different points in the interaction. We’ll look at the debates about which one is ‘better’ later in the article. 

Thinking aloud may be the single most valuable usability engineering method.


— Jakob Nielsen, author of Usability Engineering

Theory, meet practice

TDL is an applied research consultancy. In our work, we leverage the insights of diverse fields—from psychology and economics to machine learning and behavioral data science—to sculpt targeted solutions to nuanced problems.

Our consulting services

Key Terms

User experience (UX): The overall interaction and satisfaction users have with a product, system, or service. It considers factors such as usability, accessibility, aesthetics, emotions, and the overall perception of the user throughout their entire journey, from initial interaction to completion of tasks. 

Usability: Refers to the ease with which users can interact with a product, system, or interface to achieve their goals effectively and satisfactorily. It encompasses factors such as accessibility, learnability, efficiency, memorability, and user satisfaction in the overall user experience.

Usability testing: A method used to evaluate the ease of use and UX of a product, such as a website, software application, or physical device. It involves observing real users as they interact with the product to identify usability issues, gather feedback, and make improvements to enhance the overall UX.

Interactive persona system: A tool used in UX design and product development to create and utilize detailed fictional characters, known as personas, that represent the target users of a product or service. These personas are typically based on research data and insights about the intended user demographic.

Likert scale: A psychometric scale commonly used in surveys and questionnaires to measure respondents’ attitudes, opinions, or perceptions about a particular topic or statement. The scale typically consists of a series of statements or items related to the topic being assessed. Respondents are asked to indicate their level of agreement or disagreement with each statement by selecting from a range of options. 

Gaze trail: In the context of eye tracking technology, gaze trail refers to the path traced by a person’s gaze as they look at and interact with visual stimuli, such as images, videos, or interfaces.

History

Since the time of Plato and Aristotle, humankind has been exploring the idea that the mind has the power to reflect on its own processes, or the ability to ‘cognitize cognition.’2 Known as ‘metacognition,’ this process refers to one’s awareness and understanding of their own cognitive processes, including knowledge about how they learn, think, remember, and solve problems. TAP can be understood as a modern application of these early philosophical ideas. 

One early precursor to TAP was the introspection method, which was developed and used by structuralist psychologists such as Wilhelm Wundt in the late 19th and early 20th centuries. Based on the belief that the mind could be broken down into its basic components through introspective analysis, the introspection method involved individuals reporting their internal thoughts and sensations while engaging in mental tasks. However, introspection eventually lost popularity due to concerns about its reliability and subjectivity.

In the 1960s and 1970s, researchers in cognitive psychology began to develop more systematic methods for studying cognitive processes. Psychologists such as Ulric Neisser and John Flavell were instrumental in advancing methods such as verbal protocols, which involved participants describing their thoughts and mental processes as they performed tasks. Flavell, in particular, conducted influential research on children’s cognitive development and metacognition, employing verbal protocols as a tool to understand how children think about their own thinking.3 Flavell laid the groundwork for the use of verbal protocols, including TAP, in studying cognitive processes across various age groups and domains.

The theoretical basis of protocol analysis, or the use of an individual’s own verbal reports as data to explore cognitive processes, was eventually outlined fully in Anders Ericsson and Herbert Simon’s seminal 1984 book Protocol Analysis: Verbal Reports as Data.4  

TAP was first introduced to the field of usability in 1982 by user interface expert Clayton Lewis while he was working at IBM. In a research report published by the corporation,5 Clayton outlined the strengths and weaknesses of the method and how it could be applied to studying the cognitive problems that people have in learning to use a computer system. Jakob Nielsen later cemented TAP’s place as a leading usability testing approach in his 1993 book Usability Engineering.6 Since then, TAP has become a staple in the toolbox of UX researchers worldwide. 

People

Wilhelm Wundt: German psychologist, philosopher, and professor who is known as the founder of structuralism and father of experimental psychology. Wundt played a leading role in developing the method of introspection to understand individuals’ cognitive processes. 

Ulric Neisser: German-American psychologist, often referred to as the father of cognitive psychology, who researched perception and active verbal memory.  

John Flavell: American developmental psychologist known for his ground-breaking work in understanding children’s cognitive development, particularly in the areas of metacognition and memory. 

Jakob Nielsen: Danish UX expert who created a set of essential rules called the ‘10 Usability Heuristics’ for creating user-friendly digital interfaces. In addition to his pioneering research, Nielsen is also co-founder of the Nielsen Norman Group, a leading consulting firm in the field of user experience. 

Anders Ericsson: Swedish psychologist specializing in expert performance and extended deliberate practice within domains such as medicine, music, chess, and sports. In particular, Ericsson studied verbal reports of thinking and people’s abilities to acquire exceptional memory performance. 

Herbert Simon: American political scientist whose work influenced the fields of computer science, economics, and cognitive psychology. In 1978 he received the Nobel Prize in Economic Sciences for his research into the decision-making process within economic organizations. 

Clayton Lewis: American cognitive scientist and computer science specialist known for his research on evaluation methods in user interface design. 

Consequences

So why conduct a think aloud protocol? 

Well, first and foremost, it’s inexpensive. TAP only requires a participant, a researcher, and the product or service to be tested. The test doesn’t rely on any special equipment, just a pen and paper to take notes or a simple audio recording device to capture the participants’ verbalizations. Moreover, because the protocol provides relatively rich insights, it can be carried out with low numbers of participants. 

TAP is also a pretty robust testing method. Even if you mess up the test by giving participants leading prompts or putting words into their mouths, you can still get reasonably good findings. 

Finally, TAP is very flexible and can be used at any stage of the product development lifecycle. It’s also easy to learn, so can be conducted by any member of a design team. 

TAP is most valuable and effective when a participant encounters problems with a product or service (Great! That’s exactly what the protocol is for). We think and act faster than we can communicate, so when a user is working through a task without any issues, thinking aloud is of limited use (the observer sees what the user is doing before they hear). However, when a problem arises, the user slows down and the observer is able to correlate their actions with what they are saying.7

Controversies

In the world of usability testing, debates rage on as to which type of TAP (concurrent or retrospective) is better overall, and how each one is most suited to different testing situations. There are several smaller drawbacks to both approaches, but here we’ll just have a look at some of the main deliberations. 

Let’s look first at cognitive load. Evidence suggests that the additional cognitive load of CTA (that is, speaking and doing at the same time) may cause participants to be less successful with their tasks. In a usability study of a library website, researchers from the University of Twente found that participants were only successful with 37% of their tasks when using CTA, compared to 47% when using RTA.8

UX consultant and researcher Talke Hoppmann, suggests that as the demand on users’ cognitive process increases, thinking aloud may prove difficult for the participant.9 CTA, therefore, may be less effective when testing more complex tasks. 

Now let’s look at memory. One of the benefits of CTA is that the verbalization happens in real time when participants’ thoughts are fresh in their memories. RTA, on the other hand, relies on participants’ ability to recall what they did in the task afterwards, a process which could be limited by cognitive biases such as the serial position effect

And finally, we come to environment. Being asked to verbalize what you’re doing is an unnatural situation; indeed, many people struggle to sustain thinking aloud and need prompting. In this respect, RTA is more ‘realistic’ than CTA. However, CTA has the benefit of being able to provide real-time, visceral emotional responses that RTA can’t offer. 

TAP works well in individualist societies where people are accustomed, and even encouraged, to so speak their mind and share their thoughts freely (although the desire to do this does differ from person to person). Yet in collectivists societies, or communities where discretion is expected, TAP may not be as effective or even culturally appropriate.10

In some cultures, such as those in Asia, the belief that silence and introspection are beneficial for high levels of thinking might adversely affect a participant’s ability to concentrate on a complex task or make them feel uncomfortable speaking out loud. participants uncomfortable speaking out loud while completing a task. In other instances, verbalizing one’s inner thoughts might not be culturally appropriate or encouraged outside homes and other private settings. Customs like this can have a negative impact not only on individuals’ willingness to engage in, or be completely honest during, the test, but also their interactions with the researcher. 

Case Study: Enhancing the Think Aloud Protocol

For years, researchers have been experimenting with combining TAP with other data collection techniques, such as surveys and eye tracking data, in order to achieve richer UX insights. These endeavors, however, have had varying levels of success. 

Surveys

Professor Lene Nielsen and her colleagues at IT University of Copenhagen combined TAP with surveys—which they coined the “think-aloud survey method”—during a user experiment.11 Rather than asking participants to speak aloud while completing a task, they asked participants to speak aloud while completing a questionnaire about their attitude towards and willingness to use an interactive persona system. When the researchers analyzed the transcripts and questionnaire responses, they found that the method provided deeper insights into the reasoning behind the participants’ Likert scale choices and responses to open-ended questions. Overall, the researchers suggest that the method can offer more nuanced insights for design and usability evaluations and an opportunity to dig deeper into motivations of choice. 

Eyetracking data

Attempts to marry RTA with eye tracking data, however, have not been as successful. A study conducted by Fatma Elbabour et al. explored whether additional eye tracking cues would help participants to remember what they did during their interaction with a website and elicit more verbalizations.12 The researchers conducted two RTA trials: in the first, participants were asked to verbalize what they did while just watching a recording of their performance; and in the second, this recording was complemented by a gaze trail of the participant’s eye movements. Contrary to the researchers’ expectations, there was no difference in the number of user problems discovered during the two conditions. One explanation put forward for this finding (which is supported by other studies on RTA and eye tracking13) is that eye movements might be too confronting and distracting for participants, and ultimately have a negative effect on their verbalizations. 

Related TDL Content

UX Research

Usability testing is an important component of UX research and is vital for viewing products through the eyes of users themselves. This article explores the history and application of UX research across a range of fields and how it has shaped some of our favorite everyday apps. 

References

  1. Nielsen, J. (2012). Thinking Aloud: The #1 Usability Tool. Nielsen Norman Group. https://www.nngroup.com/articles/thinking-aloud-the-1-usability-tool/
  2. Hughes, A. J. (2019). Measuring Metacognitive Awareness: Applying Multiple, Triangulated, and Mixed-Methods Approaches for an Encompassing Measure of Metacognitive Awareness. Journal of Technology Education, 30(2), 3-20. 
  3. Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry. American Psychologist, 34(10), 906-911. https://doi.org/10.1037/0003-066X.34.10.906
  4. Ericsson, K. A., & Simon, H. A. (1984). Protocol Analysis: Verbal Reports as data. The MIT Press. 
  5. Lewis, C. H. (1982). Using the “Thinking Aloud” Method in Cognitive Interface Design. IBM, Technical Report. RC-9265. 
  6. Nielsen, J. (1993). Usability Engineering. Morgan Kaufmann. 
  7. Beaton, A. et al. (n.d.). HCI Lecture 5: Think-Aloud Protocols. University of Glasgow. https://www.psy.gla.ac.uk/~steve/HCI/cscln/trail1/Lecture5.html#:~:text=usefulness%20be%20optimised%3F-,What%20is%20a%20think%2Daloud%20protocol%3F,and%20wondering%20at%20each%20moment.
  8. Van den Haak, M. J., de Jong, M., & Schellens, P. J. (2004). Employing think-aloud protocols and constructive interaction to test the usability of online library catalogues: a methodological comparison. Interacting with Computers, 16(6), 1153-1170.
  9. Hoppmann, T. K. (2009). Examining the ‘point of frustration’. The think-aloud method applied to online research tasks. Quality & Quantity, 43, 211-224. 
  10. Barnum, C. M. (2021). Usability Testing Essentials. Elsevier. 
  11. Nielsen, L., Salminen, J., Jung, S-G., & Jansen, B. J. (2021). Think-Aloud Surveys: A Method for Eliciting Enhanced Insights During User Studies. Human-Computer Interaction – INTERACT 2021: 18th IFIP TC 13 International Conference, Bari, Italy, August 30 – September 3, 2021, Proceedings, Part V, 504-508. https://doi.org/10.1007/978-3-030-85607-6_67
  12. Elling, S., Lentx, L., & de Jong, M. (2011). Retrospective think-aloud method. Using eye movements as an extra cue for participants’ verbalizations. CHI ’11: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, May 2011, 1161-1170. https://doi.org/10.1145/1978942.1979116
  13. Elbabour, F., Alhadreti, O., Mayhew, P. J. (2017). Eye Tracking in Retrospective Think-Aloud Usability Testing: Is There Added Value? Journal of Usability Studies, 12, 95-110.

About the Author

Dr. Lauren Braithwaite

Dr. Lauren Braithwaite

Dr. Lauren Braithwaite is a Social and Behaviour Change Design and Partnerships consultant working in the international development sector. Lauren has worked with education programmes in Afghanistan, Australia, Mexico, and Rwanda, and from 2017–2019 she was Artistic Director of the Afghan Women’s Orchestra. Lauren earned her PhD in Education and MSc in Musicology from the University of Oxford, and her BA in Music from the University of Cambridge. When she’s not putting pen to paper, Lauren enjoys running marathons and spending time with her two dogs.

Read Next

Notes illustration

Eager to learn about how behavioral science can help your organization?