Bringing Behavioral Science into the Real World with Dilip SomanPodcast October 25th, 2021
Listen to this episode
In this episode of the Decision Corner, Brooke speaks with Dilip Soman, Canada Research Chair in Behavioural Science & Economics, University of Toronto Professor and Director of the Behavioural Economics in Action Research (BEAR) Centre at Rotman School of Management. Together they explore the translation of behavioral science theory into practice, common intervention pitfalls, and the types of strategies organizations and individuals can implement to make their interventions more robust and ultimately, more successful. Some of the topics discussed include:
- Why ‘shopping at the nudge store’ doesn’t always lead to the best outcomes, and how practitioners should consider the unique ‘seemingly irrelevant factors’ that exist in their particular context.
- The ladder of evidence – adopting a variety of approaches to intervention testing, that isn’t just another randomized controlled trial.
- Moving beyond statistical averages and considering the larger picture.
- Why a house listed for $1 will likely get a much higher sale price than the predetermined asking price.
- Organizational and psychological barriers to intervention testing and experimentation.
- How individuals can catalyse change in their organizations, and overcome some of the human biases that impede on the ‘discipline of testing’.
The Nudge Store – Why It Might Not Have the Exact Intervention You Need
“The nudge store idea is kind of like saying, “Okay, let me do a meta-analysis, let me see what the most common success stories are, and I’m just going to take them off the shelf, bring it to my lab, or bring it to my field, and implement them.” And it doesn’t work, because for lots of good reasons, your ‘supposedly irrelevant factors’ are going to be different.”
Translating Theory to Practice, and the Role of Incentives
“Well, I actually am not convinced that knowledge creation in itself is a problem, but translation is. So for example, if I look at the incentive structures for most academics, I’m not incentivized to say, “Look, I found that this intervention works, but only if the following 500 conditions are met.” I’m incentivized to say “Changing defaults is a boon to save the world from all kinds of social problems.”
The Discipline of Testing
“Theory is great. It’s going to help you come up with a bunch of starting hypotheses, it’s going to help you with a first draft of your intervention, but unless it is tested in the context in which it’s actually going to be deployed, that’s going to be a problem. And so the tension between precision and utility in the field is best resolved by creating that discipline of testing. We can be precise about the process, if not about the outcome.”
Context is King – Learning From a Behavioural Finance Intervention in Mexico
“In most western societies, the oldest “trick in the book” is to ask people to appeal to their own future self. It turns out in Mexico, that didn’t work. It didn’t work because appealing to your own future self was seen as selfish, it’s seen as taking away money from your existing family. And so we played around, we changed the messaging. It was now about the family’s future. That worked.”
The First Step for Organizations
“I think the first thing they can do is to simply document past successes and variables, and try and explain variations through these, what we call SIFs, ‘Supposedly Irrelevant Factors’.
How Human Bias Affects Our Measure of Intervention Effectiveness – The Case of Overconfidence & Feedback
“In the overconfidence research, there’s a classic finding that if you compare overconfidence by professions, it turns out the least overconfident people are weather forecasters. It might be that they may not be accurate, but they’re not confident about their lack of accuracy, and it’s because they get feedback every single day. If I’m a physician on the other hand, or a rocket scientist, I don’t get too much feedback. If I’m a doctor, somebody comes to me with a headache, I’ll prescribe pills perhaps, and then if all goes well, they never come back. There might be other reasons why they don’t come back.”
Brooke Struck: Hello everyone, and welcome to the podcast of The Decision Lab, a socially conscious applied research firm that uses behavioral science to improve outcomes for all of society. My name is Brooke Struck, a research director at TDL, and I’ll be your host for the discussion. My guest today is Dilip Soman, Canada Research Chair in Behavioral Science and Economics, and Director at BEAR, the Behavioral Economics In Action Group at the Rotman School of Business, University of Toronto. In today’s episode, we’ll be talking about moving from knowledge to action, how to translate nudges into the field, how to scale them up, and pitfalls to watch out for along the way. Dilip, thanks for joining us.
Dilip Soman: My pleasure, Brooke. And it’s a lovely topic, something that I’m very passionate about studying, so it’s lovely to be here.
Common Behavioural Intervention Pitfalls – The ‘Nudge’ Store
Brooke Struck: What challenges are you seeing in the way that people are trying to design behavioral interventions? What’s keeping you up at night that’s got you working on this topic?
Dilip Soman: Well, I’m not sure it’s keeping me up at night, but I think the challenge for our field has been the fact that we were a growth field, and now we are at maturity. So I think we’ve gotten to a point where people know that behavioral interventions work, we know behavioral science is important, we know that it’s really important to understand the science of nudging, but now we are at a point where we need to scale it up, and we’re at a point at which we need to standardize processes. And I think any field that goes through those growing pains will know exactly what that entails.
So for example, in all of our previous history as a field, let’s say the past 14 or 15 years, we’ve focused on successes. We’ve focused on things that worked. Our journals are full of examples of interventions that worked, we haven’t paid as much attention to things that didn’t work, and why they didn’t work, and when they didn’t work. And I think that’s the kind of stuff that we need to start thinking through now, the idea that just because an intervention worked for Brooke, let’s say you and your group in Montreal, doesn’t mean it’s going to replicate for me and mine next week in Toronto. So I think that’s our next challenge.
Brooke Struck: In a previous conversation, you used an analogy that I absolutely loved. People wanting to use behavioral insights in practice assume that they can just walk down to their local nudge store, pick something up off the shelf and expect it to work out there in the wild. Now that analogy is really compelling, but when we think about things that way, and when we behave that way in our practice, often what we find is that the nudges don’t work or they’re not as effective necessarily as they had been demonstrated to be previously in other contexts. Walk us through what’s going wrong there?
Dilip Soman: Great question. It’s a fabulous question because I think it gets to the heart of what makes behavioral science different to a practitioner than most other sciences. And let me illustrate that with another example, before we get to the nudge store. So suppose I lift an object, and I hold it up and I say, what’s going to happen if I release that object, it is perhaps uncontroversial for most people to say that the object will fall to the next possible surface, the floor, perhaps. And it doesn’t matter who holds up the object, it doesn’t matter which country we’re in, what language we speak. The object is going to fall to the floor, and that’s because the brackets of the context in which that particular theory, the theory of gravity, holds true is pretty large, or maybe the phenomena of an object falling to the ground.
Now if I took that same object to the moon or some other celestial body and I released it, it probably isn’t going to fall to the ground immediately. And so every theory has its, in psychology I guess we call them moderators: situations under which the theory results in a particular outcome, and situations in which it doesn’t. And it’s just that in most other theories those brackets of outcome spaces are really, really large. So think about physics, think about engineering, think about microbiology. And so if I’m actually using some of those sciences where effects are pretty robust on planet earth, then the way I think about science is very different. Then I can actually do a meta-analysis, I can figure out what the generalizable insights are, what are the most common phenomena. And if they have been demonstrated by a lab in Germany or South Africa, they’re going to work here because nothing in the context that is relevant to that phenomena has changed.
Now, behavioral science is different. We know from the work of Richard Thaler, and in fact, everybody else, little things in the environment that we think shouldn’t matter, matter. And so this could be all kinds of things. It could be time of the day, it could be whether you are inside or outside a store. Thaler called them supposedly irrelevant factors. And so all of our phenomena are predicated under a set of those supposedly irrelevant factors. And so what that ends up creating is the belief that something I read about in a paper is easily replicable.
Why? Because no paper is going to go in and document every single aspect of the context. And so the nudge store idea is kind of like saying, “Okay, let me do a meta-analysis, let me see what the most common success stories are, and I’m just going to take them off the shelf, bring it to my lab, or bring it to my field, and implement them.” And it doesn’t work, because for lots of good reasons, your supposedly irrelevant factors are going to be different. And I can share lots of examples of that.
Brooke Struck: Actually, if you have an example off the top of your head, that’d be great.
Dilip Soman: Yeah. I mean, several years ago, I and many other people did research on credit card spending, so the idea that everything else held constant, if you pay using a credit card compared to other forms of payment (and keep in mind, this was the 1990s, so credit cards were like the state-of-the-art payment technology back then. Anything else included things like cash or check or barter, I suppose), but they did tend to spend more with credit cards. And again, holding everything else constant.
In the lab, we did it by making sure people had access to liquidity, so that wasn’t an issue, it’s just that the “store” in the lab only accepted either a check or a credit card, and many people have shown that they spend more when using credit cards. And now the question is, why does that happen? There was an elaborate theory for why it happened, the idea being that certain payment mechanisms leave a weaker trace in your memory. So when I’m faced with a new purchase opportunity, the question that I ask myself is, “But gee, how much have I spent in the recent past on things like this?”. And if the answer is “I’ve spent a lot”, then I’m less likely to make a new purchase. If the answer is that I haven’t spent as much, then I’m okay making the purchase. And if I’m paying by credit card, I just don’t remember those past expenses.
So that was a story there. And obviously there were many other theoretical explanations, but that’s the key explanation in some of my work. And so I said to myself in the lab, how do I reduce this effect? And the answer was simple, it was simply to give people feedback on how much they had spent. So imagine a world in which you’ve got a mobile wallet, and every time you tap to make a payment, it just pauses and shows you a list of your last 15-20 expenses. So it seemed like something that could have been done easily. And so we get results by giving people feedback. It works beautifully. Other people have tested this in a field setting, we’ve done it in the lab.
And so there’s a fairly robust set of effects there. Now, it turns out in 2010 or so, the government in South Korea decided to change policy. And the primary reason for this policy was to prevent fraud, but they wanted to send such reminders to people that had just made a credit card transaction. Now, obviously this is common nowadays in many parts of the world, back then it was very revolutionary. And in their policy documents they said, “Well, yes, we want to prevent fraud, but we think this will help people be more prudent because they now have feedback on every single expense that they made.” So we looked at the data, and surprise, surprise, we actually found the opposite effect, that on an average, people who had opted in to receive these notifications spent more instead of less. And so of course we found that interesting, we combed that data a little bit more, and found out that there were two groups. There were heavy spenders, people that had large credit card bills for whom the text message actually worked in decreasing their spending, but for the vast majority, 85% or so, it increased spending.
Now, why did that happen? There was an important difference in terms of how the message was delivered in the lab or in the previous field experiment, versus here. In the previous experiments, the reminder came in on the same interface that you were making a payment. So you swiped the card on that kiosk or the screen and it said, “This is your past expense.” Same thing in the lab, whereas here it came to a separate device, it was sent through with a text message. And at some point in time you think, “Well, okay, so what’s the difference?” Well, the difference is you don’t have to look at your phone when you’re making a purchase, so it wasn’t salient, but it actually also created a secondary effect, which in a lot of research on consumer psychology and the use of technology, we call digital dependency. And so when we interviewed people, they would say things like, “Oh, if I ever needed a record, I know my phone has it.” And so instead of being more vigilant with their spending, which is what the reminder was supposed to do, they actually outsourced all of that to a phone. And so they were actually more disengaged than they were previously.
So in hindsight, a little difference in terms of how that message was delivered, supposedly irrelevant, was massively relevant. And so that’s one example that jumps out, but there’s a lot of others.
Brooke Struck: Yeah, that’s great. And these supposedly irrelevant factors, I think as your example so nicely illustrates, are ones that will often surprise us. We don’t know where to look for them before they crop up and kind of smack us in the face.
Dilip Soman: That’s correct. And I think Richard Thaler and others have written about the fact that a lot of these things make sense after it’s happened, and this was true for us too. When we found the effect had backfired, we could explain it , but perhaps if we were there at the time of designing the intervention, maybe it would’ve escaped us too, we wouldn’t even have thought of this. And so that’s a really important point, these things creep up on you.
Translating Theory to Practice – The Ladder of Evidence
Brooke Struck: So I want to go back to one of the points you were talking about earlier around these brackets or these conditions under which effects will be robust. The model of knowledge creation in modern western science, which is also the kind of epistemological model that’s embedded in a lot of our industrial systems, is all about consistency and standardization. We standardize a thing so that we know that it’s always going to behave exactly the same way, and then we can predict it, we can model it, all these things. So it’s really primarily about control, but this philosophy really clashes with a situation where what we’re trying to control is human behavior, because there are so many of these non-standardizable features that we don’t even know to look for before they crop up. So what are the tensions that this difference creates in terms of both knowledge creation, but also trying to mobilize knowledge in the field and actually create interventions that work?
Dilip Soman: Yeah. I think there’s a lot in that question, so let’s try and unpack that a little bit. And I think I resonate with your comment about the push for standardization. I’m an engineer by training, that was my first degree, and so I know exactly what you mean. And I do think in theory, if we had what I’m going to call a ‘grand unified theory of human decision-making’, which we do not, it would still be possible to standardize stuff. I mean, we could still say under a certain set of conditions that involve 500 variables, this is the standard intervention. It’s just going to be too complicated and too clunky for it to be useful. And so I think to the extent that the dimensions on which human behavior can be so vastly different, yes, in theory, we could have standardization, but in practice, that’s going to be hard.
So what does that mean for knowledge creation? Well, I actually am not convinced that knowledge creation in itself is a problem, but translation is. So for example, if I look at the incentive structures for most academics, I’m not incentivized to say, “Look, I found that this intervention works, but only if the following 500 conditions are met.” I’m incentivized to say “Changing defaults is a boon to save the world from all kinds of social problems.” Now that doesn’t mean you actually say that, but oftentimes people read it that way. So one of the questions that we are grappling with is, if we can create interventions to help a practitioner understand that these sets of features are responsible for the fact that it was found, could that perhaps push them to be a bit more cautious in terms of just going to the nudge store and picking things up off the shelf.
So I think it’s really the practice where it imposes a challenge. I think in theory, like I say, if we did have a grand unified theory, then we could be precise and useful, but we don’t. And so we can either be precise or we can be useful. And I think what we see in certain parts of the academic literature is a lot of precision, but again, we don’t know under what conditions, there’s simply too many, and so as a practitioner, it doesn’t help me. Or we can be less theoretical, less precise with the theory, but more practical by giving people pragmatic advice. And that’s been my focus and the focus of our work in a partnership that we call the Behaviorally Informed Organization’s Partnership.
We’re saying, look, theory is great. It’s going to help you come up with a bunch of starting hypotheses, it’s going to help you with a first draft of your intervention, but unless it is tested in the context in which it’s actually going to be deployed, that’s going to be a problem. And so the tension between precision and utility in the field, I think is best resolved by creating that discipline of testing. We can be precise about the process, if not about the outcome.
Brooke Struck: So testing is kind of the solution to this problem of trying to figure out are there additional challenges in my practice ecosystem that I hadn’t anticipated just based on what’s out there in the literature. To go back to our nudge store sample, I take something off the shelf and I go and use it, but I don’t use it blindly. I don’t go in assuming, all of this is just going to work. What I’m taking off the shelf is a hypothesis. The better the knowledge creation is, the stronger the hypotheses are that you’re going to be pulling off the shelf. And when I say a stronger hypothesis, I mean one that has a higher likelihood of being true.
Dilip Soman: Correct. So I absolutely agree with what you’re saying. I do think there are a couple more challenges as we start diving into this whole issue of trying to see how knowledge can be best applied in the field. I think one of the other big challenges is, again, back to the incentive structure of academics and practitioners. Let’s say I’m an academic and I’m interested in a particular phenomena, a particular theory. I might have made a career out of studying mental accounting or the pain of payment or whatever else. You might not care about that. You simply want to get people to purchase things, or to not purchase things, or to save for the future. And so you are seeing the entire collection of academic work as a toolkit, as a portfolio from which you’re looking to choose tools, whereas I’m only providing one of those tools. Now, in engineering, we have a very clear-cut way of thinking through what’s the right tool for what situation. In the behavioral sciences, we don’t.
So let’s imagine I’m trying to encourage people to save more for retirement. This is a project that we were working on in Mexico in partnership with ideas42, as well as the government agency there. If I look at the literature, they’d say, “Well, you can change the framing of the message. Instead of telling people how much they will gain in the future, you can tell them how much they would lose if they did not contribute. You could introduce implementation intention prompts. You could think about just increasing closeness with your future self.” There’s all of these things that have been demonstrated in the literature. Is there any research which tells me which of these three things is the best thing to do?
The answer is no. And the answer is no because nobody has the incentive to do it. Practitioners are too busy solving problems, academics are too busy in their particular silos. I think we need a lot more of that research, because otherwise how’s a practitioner going to figure out what the right tool to use is. So again, in that world, I think some preliminary testing helps. And then the question is, what does it actually mean to test? What does it mean to take an idea, develop a starting hypothesis, and then roll it out into the field as a test? I like to think about it as a ladder of evidence, kind of an idea. I don’t think everybody should start off by doing a massive randomized control trial. That’d just be a mistake.
But start off with the survey, start off with simple lab experiments where you can see what people’s hypothetical reaction to an intervention might be, then think about a design session, then think about a more complicated experiment, then think about a trial. And so I think as we move through this ladder of evidence, we can start off with a wide collection of options that we want to test, narrow them down successively until you’re left with a few manageable ones to do in a randomized trial. So I think that’s where our efforts should be, how to fine-tune that process to help people make sense of the knowledge.
How Can Academia Move Away from the RCT?
Brooke Struck: I like that concept of a ladder of evidence very much. It’s something that here at The Decision Lab, I think we put into practice, though we’ve never had a name for the concept that fits so nicely as that. One of the things that we found though, is that this kind of ladder of evidence approach is really disjointed from the experiences that a lot of people have coming out of academia. You mentioned the initial reflex is to run a massive randomized control trial, where is it that we learn? Where is it that we train people in the field to actually develop their skills, and also their reflexes to reach for different rungs on that ladder, rather than always going to the top rung and assuming that an RCT is the first thing?
Dilip Soman: Yeah. I mean, I think the challenge with the PhD training is, of course you teach people at the top rung with the assumption that they know how to get there. That assumption might not always be true. And I think the reason it might not always be true, is that the simple geography of the place where you get your PhD, or the kind of people that you’re working with on your PhD formative years, shapes the part of the world that you think is representative of the whole world. So we have this projection bias, we think the whole world looks a lot more like ours than it actually does, and so I think really pushing and inculcating the discipline of listening and talking and observing is something we haven’t done a good job of, not just in academic programs, but also in the field.
I think a lot of people go in with, we call it solution-mindedness. They have an intervention that they know they want to work, and so it’s really a case of doing a motivated search of the environment. We’ve tried really hard to develop tools to do a simple audit. And you guys do the same thing too; let’s look at the emotions that people go through, let’s look at the cognitions, let’s understand the motivations, let’s look at the perceptions, are people looking at the problem the same way as you’re looking at it? So I remember many years back when I was doing research on savings in the Global South, I remember going to Thailand, talking to a farmer who had absolutely nothing saved up in the bank, and my colleague at that point in time asked him whether he was worried about the fact that he had nothing saved up, he was 70 years old.
And he looked back and said, “Well, I have four sons, why would I be worried?” And that moment was enlightening for me, because I think it was the fact that we had always assumed that the unit of analysis is an individual, it’s not a group. Same thing with indigenous populations in Canada. We think about us as individuals, but most indigenous people operate in communities. It’s the community that’s the unit of analysis. Look at our bank accounts, these are individual bank accounts. Look at our retirement products, their individual products. Our identities are individual. All of our documentation, our official systems are all based on individual units of analysis. And I think unless we change that, we will always fall into this trap of thinking that everybody else operates the same way as we do.
So I think it’s really important to build that discipline, develop these frameworks and give people a checklist of what to observe. Is it the same unit of analysis? Are the socioeconomic structures different? Are the institutions different? Is the family structure different? Is the governmental support different? So one of my other PAT areas of interest is welfare programs, things like cash transfer programs. And a lot of people assume that, why would the government not be able to transfer money to low-income populations, because everybody has a driver’s license, and everybody has an address? Guess what? In many parts of the world, they don’t. And so you go to the Favelas in Latin America, the slums in South Asia, people do not have addresses, they do not have phone numbers, they do not have bank accounts, and so we start with those kinds of challenges. So always as a global researcher, it is tempting to impose our worldview, but I think we need to do a much better job of pushing people to study those bottom rungs of the ladder.
Look At The Bigger Picture – Distributions Versus Averages
Brooke Struck: So let’s dig into that a little bit. In our approach to research in the behavioral sciences, we tend to focus a lot on averages. So the way that we actually do statistical analysis is focused on often the kind of mean difference that you’re able to achieve with one intervention as opposed to another, but what you’re talking about here is not just random noise in the distribution, what you’re talking about is systematic deviations from the means. So what happens to these two outliers when we have these circumstances where we’re trying to establish whether a nudge works, and the population that we’re testing it out on is actually very heterogeneous, and in ways that we might not anticipate until we’ve already figured out that something’s broken and it doesn’t work?
Dilip Soman: So I think there are two aspects to what you’ve said. I think one is the fact that, as researchers, our dependent measures of interest are usually averages, when in fact, we should be looking at the distributions. We don’t do that. So I can think of many examples where interventions changed the distribution, but not the average. And it’s tempting to say, oh, the intervention didn’t work. So I’m going to give you a completely different example from a different domain. This is not a behavioral intervention in the classic sense. But several years ago, I was sitting in my office, minding my own business, when a reporter called and said, “I saw this real estate agent doing a listing in which they have listed a house in a very upscale part of Toronto at a dollar, $1, why do you think they did that?”
And so of course there’s a lot to unpack there. Let’s not go into it. The one thing that I did do, I ran a few simple survey-based experiments. And I think in hindsight, this makes perfect sense, the idea that when you give people a listing price and they’re looking at the property, they’re going to say, “Okay, they asked for a million and a half, is it worth that much to me, or is it worth more or less? And if more, how much more?” So we are anchoring at the listing. If we don’t have an anchor, then it turns out we are not doing that, we are actually building our valuation from first principles. And long story short, based on some of the studies that I did,a,these are not very clean, so not in a journal at this point in time, but actually the averages with and without don’t change, but the distribution does.
So in the $1 case, you get some really high rates and some really low ones. And in the case of the listing price, you get distributions that are relatively tight around what the asking price was. Now, here’s a world in which the seller doesn’t really care about the average, they just have the one house to sell, so all they care about is the top end of the distribution. And the reason I bring this up, is that perhaps if I’m a business, I’m a marketer, and my goal is to focus on only the top end of the distribution, we need to think about what the distribution is. If I’m a government and I want to focus on the lower end of the distribution, I want to make sure I leave no one else behind, we need to focus on the distribution.
I think that’s an element that we usually miss in our research. We report the height of bars, and that’s pretty much it. Now, to your point about heterogeneity, that’s a massively important point. And so back to another field project along the lines of the one I mentioned earlier in Mexico, we did an intervention where, like I said, we were trying to get people to make voluntary contributions to their pensions, text messaging interventions. We came up with all kinds of different interventions. The one that worked the best was a family appeal. So in most western societies, the oldest “trick in the book” is to ask people to appeal to their own future self. It turns out in Mexico, that didn’t work.
It didn’t work because appealing to your own future self was seen as selfish, it’s seen as taking away money from your existing family. And so we played around, we changed the messaging. It was now about the family’s future. That worked. And then we said, well, obviously it’s not going to work for everyone. If I don’t have a family, this shouldn’t work. And so our first impulse was to see if we could somehow look at the data and run this intervention separately for people with and without family, and people in urban versus rural, and whatever else. And if I was going to do that as a behavioral scientist, I would look at the three or four things that I think mattered, I would run a giant 2×2 x 2×2, or whatever that experiment is, that’s not going to be efficient.
Plus when it comes to certain things like the age of marriage, I don’t know when people get married. Census data gives me age up to 25, and 25 to 37, so is that the distribution or not? We don’t know. So we ended up using machine learning, and the technique we used was causal forest and causal trees, which is really a recursive clustering algorithm where we then found that for anyone under the age of 28, the intervention didn’t work, and for anyone above the age of 28, it did work. It made sense, that was the average marriage age at that point in time. We were able to do this by gender. Women got married at a younger age in that particular society. So these are the kinds of things we can now do, and we can say, “Here’s the group in which this intervention works the best, here’s another group in which a different intervention works the best. We can customize.
We need to do a lot more of that. I think understanding and harnessing heterogeneity is sort of the next big thing. My colleague Tanjim Hossain and I wrote a commentary in a journal where basically, the title of the commentary is, ‘Successfully Scaled Innovations Need Not Be Homogenous’. In fact, they should be heterogeneous. And so you’re absolutely right. That’s something we need to think a lot more deeply about. The one thing I will say is with the advent of these clustering algorithms, the ones like machine learning,, there is a lot more we can now do, and I think we need to actively use those in our research.
Organizational Barriers to Experimentation
Brooke Struck: Let’s talk about the challenges to doing that kind of work. So you mentioned that these algorithms are becoming more and more available, and the skills to build them and to work with them are becoming more and more widespread as well, but there are still some pretty important barriers to this kind of experimentation. One came up kind of between the lines in the story that you mentioned, which is access to good data. Good data itself is obviously such a crucial starting point for any of the kind of research approach that you’re talking about here, but there are costs to experimentation that are creating barriers as well.
And I have in mind costs, both monetary costs, but also organizational costs. It’s a big psychological change to go out with hypotheses and say, “We don’t know which of these is true, and our purpose is to go and figure it out.”, as opposed to saying, “I already know what the answer is, and I’m going to run a pilot test that’s just going to validate it before we scale it up.” So can you talk to me a little bit, or talk to us a little bit, about the costs of experimentation?
Dilip Soman: Yeah. I mean, Brooke, you’ve probably been in meetings that go on and on and on in the marketing world where people are saying, “Okay, should I cut the price for a product, or should I offer some other kind of promotion?” And then there are debates and discussions and case studies being pulled out, and I sit back and say, “Why don’t we just test them out?” And you’re right, I think what prevents us from testing is obviously a combination of both the testing infrastructure, a lot of companies don’t have it, I mean, some of the bigger ones don’t have it, but also the non-testing aspects, fear of failure, lack of humility. That’s a big one. The thought of “What if I test this and I find out that everything I’ve been doing for the past 20 years was not the most optimal thing”,holds back a lot of people. They have this, “If it isn’t broken, why test it” attitude?
Well, maybe it would be broken if you knew what the best outcome was. So we like to think about this construct of the cost of experimentation more generally. And it’s kind of interesting that in talking about how to best embed behavioral science, we go back to economics 101, law of demand and supply. If something becomes cheap, the demand for the thing goes up. The demand for its compliments goes up, the demand for its substitutes goes down. We know this. We’ve known this for a long, long, long time. It works with the price of coffee, and the consumption of cream and sugar, and so on and so forth. So I think it’s the same story with experimentation. If experimentation is cheap, we’re going to see more of it.
Now, what makes experimentation cheap? Well, let’s start with the obvious one, which is the access to samples. Google has access to its customers 24/7, one degree of freedom. Procter & Gamble doesn’t, or many others don’t. So one is the access. The other is the ability to randomize in an experimental sense and the ability to quickly launch experiments. Again, Google has it. I’ve… I mean, I don’t know if this is true, but I’ve heard stories where I don’t think people at Google sit in strategy sessions and say, “Well, should I have a white background or a blue background?”, they just kind of test it and figure out what works best. So I think they have the appetite to do that. A lot of other companies don’t, it’s not part of their culture, they don’t have platforms to test out.
So if I’m a package goods manufacturer, how do I run an experiment? Even if it’s at the bottom of the evidence ladder. So one of the things we’re doing at the partnership is to try and build some sort of digital labs or avenues for companies to collaborate and collect data there. And again, it’s early stages, but I think just, again, reducing that cost I think is massively helpful. So those are all the data collection costs, and then that gets into quality of data and so on and so forth. I find the other stuff more interesting, the fact that there are organizational barriers, there are these issues of lack of humility, there are issues of organizations not being able to follow through if in fact they get everything right. So for example, if I have the ability to spot what’s going on on a day-by-day basis, can I actually change things?
And this struck me last year when I remember, we had just gone into lockdown after the pandemic, I think three months had passed, and I’m still seeing ads of traveling to exotic vacation destinations. I’m like, what’s going on here? Obviously it’s because you buy six months worth of media, and it’s locked in place. And so if I can’t change based on what I’m seeing around me, then that’s kind of pointless as well.
So I think it is a fairly complex thing, and I think the way we need to think about it is…To me, I think there are three legs to this whole thing. I think there’s obviously the data quality, all of that stuff, ability to randomize. There is the agility, the ability to move if in fact I learned something new. And then there is the mindset, the willingness to think about everything as an experiment. And I think we need all these three things to happen at the same time. It’s your classic three-legged stone. One of them doesn’t happen, and it’s just going to come crashing down. So it’s a great question, a great area of exploration, a lot of work to be done in that area.
Brooke Struck: In the work that you’ve done, have you seen any patterns about which of those three legs of the stool tends to be the biggest barrier or the one that people will encounter first? And again, hearkening back to the discussion we had about heterogeneity. Maybe for different organizations it’s different.
Dilip Soman: Yeah. So it is different. And I think obviously part of it is how close you are, and how many degrees of separation you have from your end-user is a big variable there, but I think the mindset one is the big one for many as well. I’ve known many companies, and we’re not going to name names here, where they are right next to their customer in terms of the value chain, but it’s these other things, things like “hat if this doesn’t work?”, or I think the other thing that trips a lot of organizations up is the worry about finding conflicting evidence.
So suppose I do a pilot and I learn that a blue background is better than white, and do a second one, and I learn that white is better than blue, what am I going to do? How do I make sense of that data? Let’s just avoid all of that and not test it. So I think those to me are the bigger challenges. And I think once an organization decides that they want to do it, I think it’s only a question of figuring out how to make it happen. Like I say, if you can’t do a field trial, you could do experiments, you could do other kinds of studies, AB surveys or stuff like that. So to me, it’s the mindset. That’s the big one.
How Organizations Can Embrace Experimentation
Brooke Struck: Okay. For organizations that are looking to move more in the direction that we’re advocating here, what are the early steps that they can take to start making progress on that?
Dilip Soman: I think the first thing they can do is to simply document past successes and variables, and try and explain variations through these, what we call SIFs, ‘Supposedly Irrelevant Factors’. And I’ve done this with many organizations. I remember there was one particular case where there was a particular intervention that we were trying, worked like a dream team, then we tried it again, just stopped working, then we tried it again, it worked like a dream. And so we were trying to make sense of what’s happening to the experiment, why is it working in some instances and not others? And at that stage, I asked my team to get together, and we sat down and discussed every single thing that we could about each of those interventions, like when was the study run, what month of the year? And once we did that, we realized there was a pattern there. Things that were run in the summer didn’t work, things that were run in the other season we have in Canada (laughter), worked.
Brooke Struck: The other season!
Dilip Soman: To cut a long story short, this was an intervention about getting people to act sooner on stuff. It was an intervention based on the theory that when people are busy, they use certain prompts. You know how busy people are in the summer. That mindset that end-users are required to have for the theory to work, didn’t exist. But it was only after about a year and a half of systematically cataloging when things happened, and what was different, that we realized that. And I think a lot of companies miss that. There’s so much data sitting there. It’s just kind of looking back at your databases and trying to figure out, is there an generalizable empirical pattern? Are there seasonal effects? Are there geographic effects? Are there certain kinds of distribution effects? This would be the first thing.
And then once you have those hypotheses, then I think it’s time to visit the nudge store, not just to pull things off the shelf, but as you said, develop a bunch of hypotheses to see how we can best correct for those imbalances. And so I think the answer is it’s all there, but really that mindset of saying, “Let me look at the data and try and find patterns”, is key.
Brooke Struck: Yeah. When you mentioned cataloging the successes and failures, one of the things that came to mind was a conversation I was having yesterday actually about the fact that when you’ve got very fluid processes within an organization, you’ve got very heterogeneous artifacts, and essentially you don’t have data yet, you’ve just got a bunch of stuff. It’s when you start to standardize the processes internally that things become more comparable, they become more structured.
If you can put stuff into an Excel database, and it’s more clear what the column headers should be, and what the information contained in each column should be, starting to structure your internal processes to be able to capture information in a way that’s comparable seems to be an important ingredient there. Because if you’ve just got a whole bunch of heterogeneous artifacts that are not characterized along dimensions that you want to be looking at later, then that’s going to be your first mountain to scale.
Dilip Soman: Yeah. Absolutely. And I think we can even go one step back. I think when you train as a social scientist, you have a very clear mental model of identifying what’s the cause, what’s the effect, what’s the mediator, what’s the moderator. What is it that we are changing to create what outcome change, and then when does it work, and when does it not work. mModerator, and through what route, mediator. And I don’t think we do that enough in our field work. Sometimes we’ll try stuff, and it has an effect, but it’s not the effect that we want, but then we have to write an impact report at the end and we say, “You know what? We always wanted to do this other thing.” But I think just having that discipline of that mental model saying, “What am I changing? To what effect? And under what conditions do I expect it to work?hen might it work?” I think that’s a useful discipline that we need to do a lot more of.
I mean, we call it the behavior change challenge statement, but just very clearly, it’s trying to understand what behavior change you want to engineer, is helpful. What are people doing right now? What do you want them to do? Because oftentimes… And you’ve probably seen this more than I have, you get behavior change challenges that are articulated at such an abstract level, like “We want people to be more engaged”. Well, that’s great. What would they do differently if they were engaged? So I think that precision is missing. And I think to the extent we can develop the habit of being precise, then I think the data can speak to that as opposed to, as you said, a lot of nebulous conversation.
Brooke Struck: There’s a great discursive trick that one of our previous guests from probably about a year ago now, Matt Wallaert, talked about. Heencountered often the same problem that you’re just describing, “I want my employees to be more engaged”, or “I want my customers to have more brand recognition” or this kind of thing. And Matt’s approach there was, “Okay, well suppose there are these two parallel universes, one that contains Matt, and one that contains Matt prime. Matt is completely disengaged, Matt prime is super engaged, but in all other ways, they’re exactly alike, they don’t behave any differently, do you care whether I’m Matt or Matt prime?” And using this as a bit of a foil to push people to say, what is it that people would do that would be observable, that would be concrete, that would actually be the thing that you care about.
Dilip Soman: Yeah. So when I teach, for example, in the executive programs or on the MBA programs, one of the other things I push people to do is to not just identify the behavior, but make sure it is a singular behavior, it cannot be decomposed any further. What I mean by that is, for example, somebody comes in and says, “I want people to spend more in my stores.” I think that’s observable, it’s a behavior. Do you want them to spend more by buying higher quality products, or just by buying more volume? And then they say, “Well, it doesn’t matter.” I say, “It does.” Because the psychological process used to get these two outcomes is different. And so I think really drilling down to the basic unit of behavior change, is key. And I like Matt’s approach, and I think that’s just pushing to make sure that it is precise, it is observable, and then we can understand what the psychological mechanism is.
Things Individuals Can Do to Catalyse Change in Organizations
Brooke Struck: So we talked about organizations and what the barriers are for organizations to change, but one of the things that I always find myself needing to remind myself is that organizations don’t make decisions, people make decisions in an organizational context. So if we look now at a different unit of analysis, if we look at the individual, for someone who’s listening to this and says, oh my gosh, this is so badly the thing that we have needed inside my organization, what can they do to start catalyzing change?
Dilip Soman: So that’s an interesting question, because I think at the end of the day, I could define the “What do they need to do” at an abstract level, but in the spirit of what we just said, let’s try and be precise. So what are we really arguing for here? We are arguing for a personal decision-making system which is as bias-free as possible, which brings in all the information to the table, which recognizes that other people might not interpret the information in the same way. So that’s a tall order, and I think it’s really important to kind of break that down. So there’s obviously the personal stuff. I know people are overconfident, am I perhaps falling into the same trap? I know people use motivated reasoning, am I doing the same thing, et cetera.
It sounds easier than it is in practice, but I think that’s something that people can accomplish by giving themselves feedback over time. In the overconfidence research, there’s a classic finding that if you compare overconfidence by professions, it turns out the least overconfident people are weather forecasters. It might be that they may not be accurate, but they’re not confident about their lack of accuracy, and it’s because they get feedback every single day. If I’m a physician on the other hand or a rocket scientist, I don’t get too much feedback. If I’m a doctor, somebody comes to me with a headache, I’ll prescribe pills perhaps, and then if all goes well, they never come back. There might be other reasons why they don’t come back.
So I just don’t get any feedback. And so I think giving yourself feedback I think is critical. So it goes back to the discipline of actually writing down things, systemizing it, but then the process, I think that’s another big one. So for example, we worked with a partner where they said, “In developing proposals for our internal funding for different projects, the proponent of each project is supposed to consult at least three or four other people and incorporate their feedback in the proposal,” and we don’t think people do that.
So a simple behavior change thing. People aren’t consulting others, and we should. And I don’t know if we solved it, but we addressed that by a simple restructuring of the form. Instead of just a big block of text saying, “How did you incorporate any feedback?”We asked them, “What are the top three things that you heard, and what’s your response?”. And just the act of being precise about the feedback, I think, increased the likelihood that people went out and got that feedback.
I think there’s a lot of things that we can change simply by changing our internal processes or our reporting processes. Just asking people why their belief in the fact that a particular intervention will work might be wrong or when might it not work, is a fantastic intervention. So I think we need to do more of that within organizations. But again, it’s one step at a time. I think fixing yourself, I think is a great first step before we can start fixing others.
Brooke Struck: Yeah. I think that’s such a nice note to end on. Dilip, thank you so much for this conversation, for taking your time and sharing your insights with us today.
Dilip Soman: Absolute pleasure. And thank you to everyone for listening in. This was wonderful, because like I say, it’s something that I’ve been thinking about a lot these days.
Brooke Struck: All right. And we hope to speak with you again soon.
Dilip Soman: Fantastic.
We want to hear from you! If you are enjoying these podcasts, please let us know. Email our editor with your comments, suggestions, recommendations, and thoughts about the discussion.