How a NASA framework is inspiring a new approach in behavioral science

read time - icon

0 min read

Feb 22, 2024

At TDL, we talk a lot about evidence-based decision-making. And when I say a lot, I mean A LOT. Huddled around our beloved coffee machine, we share insights on cool new research methods, surprising observations from user interviews, and evidence-driven ideas for digital products. We spend countless meetings (and emoji-filled Slack messages) determining how to best collect evidence, and how to apply insights to make important decisions across organizations, non-profits, research institutions, and regulatory agencies.

Suffice it to say, evidence is a core part of what we do. But, the evidence that we gather in a consumer research project varies drastically from the evidence created in a larger behavior change pilot. As we commune around our coffee machine, it can be difficult to concretely talk about different types of insights, and, importantly, what these insights mean for our partners and clients.

So, is there a better way that we can talk about behavioral insights? Can we better express when our work can reliably, equitably, and sustainability drive behavior change? Spoiler Alert: There is. And it’s inspired by a NASA-driven idea: The Technology Readiness Levels.

Behavioral Science, Democratized

We make 35,000 decisions each day, often in environments that aren’t conducive to making sound choices. 

At TDL, we work with organizations in the public and private sectors—from new startups, to governments, to established players like the Gates Foundation—to debias decision-making and create better outcomes for everyone.

More about our services

Technology Readiness Levels

In 2020, a group of empirical psychologists (IJzerman, H., Lewis, N.A., Przybylski, A.K. et al.) discussed the key considerations of applying findings from behavioral science to larger policy-based decisions. They highlighted that we need to assess how “ready” a behavioral intervention is before persuading others to apply insights in high-stakes situations.

To assess readiness, they turned to the National Aeronautics and Space Administration (NASA) for inspiration. Specifically, they turned to NASA’s widely used “Technology Readiness Levels”, or “TRLs”.

What are the TRLs?

Technology Readiness Levels are a systematic way to evaluate the quality of scientific evidence. While originally focused on creating “flight-proven” innovations, TRLs are also used in broader scientific contexts to assess innovation readiness. Is the innovation just an idea in your head? Has it been tested in a laboratory? Or has it actually been validated in its real context? The framework progresses from Level 1 all the way up to Level 9, with the innovation increasing in readiness as it moves through the levels.

Let’s say that I wanted to build a new coffee machine for our evidence-centric huddles. Firstly, I would build a basic understanding of how water can be passed through coffee grounds to create a drink. TRL Level 1? Check! Moving up the chain, I would start to develop an idea of a machine that would be able to move water through coffee grounds. I’d then go out and build some prototypes, likely starting with some very ugly and flimsy designs, and progressing towards more sophisticated machines. I would test my new design, first at home (so as not to embarrass myself if something caught on fire), but then maybe at a friend’s house or local cafe. Finally, I would bring it in for some final trials in the office itself and make any improvements, bringing me to TRL Level 9. Woohoo! Coffee-fueled chats for all.

Our psychologists of interest (IJzerman, H., Lewis, N.A., Przybylski, A.K. et al.) were inspired by this idea of increasing levels of readiness and felt that a similar concept could be applied to social and behavioral research. Thus, the “Evidence Readiness Levels” were born.

Evidence Readiness Levels

The Evidence Readiness Levels (ERLs) operate similarly to the TRLs. There are 9 levels, progressing from a more basic understanding of fundamental principles to a more nuanced understanding of a solution. 

However, they are not exactly the same. As the authors put it “there are many differences between behavioral and rocket science”. With that in mind, let’s transition away from talking about building a new coffee machine (a technology-centric context) towards “encouraging employees to clean up the coffee area after use”  (a behavioral-centric context)”. 

ERL 1: Define the Problem(s) in Collaboration with Stakeholders

Our journey begins with a common office gripe: a messy coffee area. We gather (around the coffee machine in question) to discuss the issue. We collect input from the caffeine addicts who first came to us with the problem (our main stakeholders) and discuss the symptoms of a messy coffee area. Then, we explore the underlying behaviors that we think are contributing to this mess. Finally, we make sure that our main stakeholders can implement any ideas that we test. 

ERL 2: Consult People in the Target Settings to Assess the Problem’s Applicability

We then extend our inquiry to include everyone who uses the coffee machine, not just our main stakeholders. This helps confirm that cleanliness (or lack thereof) is a widespread issue, warranting a collective effort to address it. We also expand our search to our tea drinkers – because hey, they also use the kitchen, and are therefore an important part of our “coffee ecosystem”. 

ERL 3: Conduct Systematic Reviews to Select Potential Evidence of Candidate Solutions

With a clear understanding of the problem (and the people who are impacted), we dive into scientific research. What strategies have successfully encouraged responsibility and cleanliness in shared spaces? Are there coffee grinders that are less likely to produce a mess? We conduct a rigorous evaluation of the literature, identifying interventions that are likely to be reliable and can be generalized to our office setting. 

ERL 4: Select Measures; Evaluate Validity and Measurement Equivalence

After selecting our most promising interventions, we establish clear metrics, such as the frequency of cleaning incidents, the amount of coffee grounds on the table at the end of the day, and user satisfaction with the coffee area's cleanliness. We then double-check that the metrics are valid and applicable in our context. Are we relying on measures that are only useful for offices with coffee capsules, or have we adequately selected ones that can be applied for espresso machines?

ERL 5: Compare Candidate Solutions in Observational Settings, Generating Formal Predictions for Positive Expected Effects and (Unintended Side Effects)

We then generate predictions for positive expected effects and any possible harms. What do we think the impact of each intervention will be? Will signage be effective in the short term and the long term? What impacts will new social norms have on the coffee area? This helps to ensure that our chosen strategies are not only effective, but are also ethically sound (e.g., Are we bullying people into cleaning up?) and unlikely to produce unintended negative outcomes (e.g., Will people stop drinking coffee out of fear of making a mess?).

ERL 6: Establish Causal Inference and Potential Side Effects in a Lab Environment, Testing Replicability via Cross-Validation

We move forward into a controlled pilot environment, possibly setting up a mock coffee area, ensuring that any observed behavior changes are directly attributable to our interventions. Our colleagues are particular about their coffee needs, so better to gather preliminary data and make changes before rocking the boat!

ERL 7: Test the Solution in a Variety of Settings and Stimuli in a Lab Environment

With initial results in hand, we broaden our testing to a low-stakes environment in the office. We make sure to test during different times of the day, under varying levels of busyness, and with different groups of people. We can then better understand how these different factors contribute to the success (or failure) of our interventions. And, importantly, we can ensure that the results that we’re getting can be replicated across various situations. Do our morning coffee drinkers exhibit the same response as our lunchtime ones?

ERL 8: Conduct Large-Scale Testing of the Solution in Settings as Close to the Target Settings as Possible

We're ready for a broader rollout! The most promising strategies from earlier trials are implemented during high-traffic periods and in the central coffee area. This higher-stakes testing phase allows us to validate the robustness and scalability of our solution. 

ERL 9: Use the Solution to Successfully Address a Crisis Situation; Feedback Evaluation to Expand Evidence

The ultimate test comes when the office faces a high-stakes situation—perhaps an all-hands-on deck meeting or a team-wide social—putting our cleanliness interventions to the test. Following this, we collect and analyze feedback, using these insights to further refine our approach.

From Evidence to Impact

As our TDL team gathers once again around our coffee machine, we can now share cool new research methods (which we plan to apply at ERL 2), surprising observations from user interviews (which were gathered during an ERL 6 pilot), and evidence-driven ideas for digital products (collected during an ERL 3 investigation). 

We’re able to recognize that not all evidence is created equal – our ERL 3 insights are exciting (but really just educated guesses), while our ERL 6 insights are more robust (but still need some validation in different settings and environments).

By discussing the readiness of our behavioral insights, we can focus on applying real-world behavioral changes that stick. At TDL, we’re lucky enough to work with partners and clients across the ERL levels, from problems to pilots, from ideas to implementation. And now, we have a perfectly clean coffee area to fuel any new collaborations that may come (Maybe that’s you?)


  1. IJzerman, H., Lewis, N.A., Przybylski, A.K. et al. Use caution when applying behavioural science to policy. Nat Hum Behav 4, 1092–1094 (2020).
  2. NASA. (2023, September 27). Technology readiness levels. NASA. 
  3. Gouvernement du Canada. (2018, January 23). Government of Canada. Government of Canada, Innovation, Science and Economic Development Canada, Innovation Canada. 

About the Author

Alexi Michael headshot

Alexi Michael

Alexi is a Consultant at The Decision Lab. Her expertise is multidisciplinary, spanning the fields of social innovation, artificial intelligence, and human-centered design.

Read Next


Unpacking the Stats: Digital Mental Health Interventions

​​In 2023, The Decision Lab conducted a comprehensive survey with over 700 participants. Questions spanned across our focus areas, including emerging technology, mental health, and personal and professional growth. Let's delve into the findings.


Political Persuasion: Rethinking The Rhetoric That Resonates

Words can be tricky. Especially in political conversations.  

We may think we’re communicating clearly, but the fact that online environments are proven hotbeds for hostility should cause us to pause and reconsider: What if we’re incorrectly detecting disagreement because someone uses words differently than we do?

Notes illustration

Eager to learn about how behavioral science can help your organization?