The Basic Idea
Have you ever witnessed a show-down between a toddler begging for a toy and their parent? The rugrat can employ dozens of ruthless tactics, even unleashing the ultimate humiliation – a mid-mall tantrum. As a bystander, you think, “Just buy him the toy, anything to make this stop!” However, as experienced parents know, this surrender will signal to the child that bad behavior gets them what they want. In other words, the toy will act as a positive reinforcer for bad behavior. In behavioral psychology, positive reinforcement is a key concept in operant conditioning, a powerful form of learning in which behavioral consequences determine the likelihood of future behavior.1
Theory, meet practice
TDL is an applied research consultancy. In our work, we leverage the insights of diverse fields—from psychology and economics to machine learning and behavioral data science—to sculpt targeted solutions to nuanced problems.
Operant conditioning: A learning process in which the consequences of behavior impact whether it will be repeated in the future.2 Operant refers to intentional action.
Positive reinforcement: When a consequence of a behavior increases the strength of behavior. “Positive” refers to the addition of a stimulus, while reinforcement refers to the increase in behavior.
Behaviorism: A movement in psychology that emphasized the study of observable processes (behavior) rather than subjective mental states.3
In the early 1900s, many theories about the human mind and behavior weren’t tested experimentally. To make psychology more scientific, some researchers suggested psychologists focus on the study of observable and quantifiable processes. These researchers sparked a movement now known as behaviorism. A key contributor to this revolution was E.L. Thorndike. Thorndike mainly studied animal learning processes by observing how cats learn to escape puzzle boxes. His findings culminated in his 1911 dissertation, where he outlined the law of effect.4 This law stated that when a stimulus and behavior are followed by something pleasant, the behavior is likely to be repeated when the stimulus is present. If the stimulus and behavior are followed by an unpleasant consequence, the behavior is likely to diminish in response to the stimulus. Although this conclusion may seem obvious now, Thorndike’s research was highly innovative at the time, as it emphasized quantifiable analysis. His work on the law of effect is widely considered the first laboratory study of learning, which set the stage for further investigations in this area of research.
In the 1930s, a young psychologist named B.F. Skinner greatly expanded upon Thorndike’s research, going as far as to even create his own puzzle box, later dubbed a “Skinner Box”.5 This smaller box, frequented more often by rats than cats, was used to study how the behavior of animals changed depending on different consequences. Based on this research, Skinner defined operant conditioning as the strengthening or weakening of behavior due to its consequences.
He also identified four types of operant learning, which are sometimes referred to as contingencies. Two of these types cause an increase in behavior (positive reinforcement, negative reinforcement), and the other two weaken behavior (positive punishment, negative punishment).5 The most well-known contingency, positive reinforcement, refers to the increased likelihood a behavior will reoccur if paired with a consequent stimulus. This stimulus is a positive reinforcer, often involving something pleasurable. This is why some researchers refer to this form of learning as ‘reward learning’. However, Skinner emphasized that what determines if a stimulus is a positive reinforcer is that it strengthens a behavior, rather than causes subjective pleasure. For example, in certain situations, highly unpleasant electrical shocks can lead to an increase in behavior. Additionally, what one person finds rewarding may be completely unpleasant to another.
Allen Neuringer later emphasized that positive reinforcement doesn’t just lead to an increase in the likelihood a behavior will reoccur, it can impact several other aspects of the behavior, such as duration or persistence. One experiment, conducted by Neuringer and Page on pigeons, demonstrated that even random behavior can be taught through positive reinforcement. They proved this by providing reinforcers to pigeons only when their pecking pattern was different from all previous trials.6
Edward Lee Thorndike
Thorndike began his academic career at Wesleyan University, studying literature. He later transitioned to psychology, which he studied both at Harvard and Columbia.4 During his time at Harvard he began to study learning in animals.5 Due to the lack of laboratory space for this innovative research, Thorndike was forced to keep his furry subjects in his room until William James kindly offered him a basement space.5 Eventually, Thorndike was offered a fellowship in Columbia, where he wrote his famous dissertation articulating the law of effect. After becoming a professor, Thorndike made efforts to transfer what he learned in his experimental research on learning to the school system and published several books instructing how to apply psychology to teaching.4
Burrhus Frederick Skinner
B.F. Skinner is widely considered the Darwin of behavioral science and the eminent psychologist of the 20th century. Although he originally aspired to become a writer, the work of physiologist Ivan P. Pavlov on classical conditioning and John B. Watson on behaviorism inspired Skinner to pursue psychology.5,7 He received his Ph.D. from Harvard, and would later be credited with various improvements to the research process studying learning. He proposed using behavior change rate as a measure of learning, devised the Skinner box to study learning, and greatly advanced using scientific methods in the field of psychology.5,7 Skinner’s research and non-academic works such as his utopian novel Walden Two have inspired great controversy, but few critics would deny the importance of this figure in advancing the field of psychology.
Several variables can affect the success of positive reinforcement, as well as the other forms of operant conditioning. The first variable is contingency, the likelihood that a reinforcer will follow a behavior.5 If rewards are rarely given after strenuous behaviors, we aren’t likely to expend precious energy performing such behaviors. Another variable is the amount of time between a behavior and its reinforcer. Usually, the quicker the reward is offered, the faster the subject learns. If too much time passes, the wrong behavior might be reinforced. The type, quantity, and quality of the reinforcer matters too. Even rats have food preferences – they learn better when offered bread over seeds!5
Different schedules can be used when conducting positive reinforcement, which heavily impacts learning.8 One option is making the schedule continuous, where rewards are given each time a behavior is performed. Another is making the schedule intermittent, which would only provide a reward after the behavior has been repeated a certain amount of times or after a determined interval of time.
Providing rewards intermittently can also be done variably, meaning that the reinforcer can be given after a varying amount of time (eg: every three to seven minutes), or a variable amount of behavior is performed (eg: every 2-5 pecks). Each reinforcement schedule can be effective in different scenarios: a boss paying their employees variably would lead to chaos, however, this schedule is very thrilling for slot-machine gamblers at the casino.
All this research conducted by Thorndike, Skinner, and their successors has not been limited to the lab. Positive reinforcement has been applied to parenting, teaching, addiction, economics, organizational behavior, and several other fields. Nudge theory, for example, is based on the idea that positive reinforcement and other non-forceful methods can successfully promote healthy decisions and behavior.9
Since the inception of operant conditioning and positive reinforcement, criticism has been prevalent. Some early critics claimed that Skinner’s perspectives were reductive, as they denied the role of biology, thoughts, feelings, and autonomy in behavior.5 Supporters believe that this criticism is a misrepresentation of his ideas, and emphasize that his methods were crucial to making psychology an evidence-based, scientific field. In the modern era, fields such as cognitive science and neuroscience can address “the black box” of the mind, which was once considered unattainable when positive reinforcement was observed. As a result, modern criticisms focus on specific flaws of the method and evidence of positive reinforcement theory, instead of dismissing the concept as a whole.
Education researchers, Scott and Landrum, addressed some of these contemporary criticisms.10 Although their article mainly responds to criticisms of positive reinforcement in the context of teaching, their rationales can apply to other fields. For example, one common criticism of positive reinforcement is that it is not backed by research. To counter this claim, the authors show that meta-analyses support the use of positive feedback in educational settings. The reason why evidence in this field may appear conflicting is that the type of reinforcer and many other factors can influence the success of positive reinforcement. Another criticism mentioned is that reinforcement inhibits creativity, which the authors refute by proposing creativity involves adapting skills to authentic situations, which would not be possible if the skill wasn’t acquired in the first place (a process that is facilitated by positive reinforcement). After refuting several other criticisms, the authors conclude that positive feedback in schools is overwhelmingly supported by scientific data and encourage readers to read and trust the scientific literature rather than pop psychology. Clearly sweeping statements about positive reinforcement, whether critical or celebratory, should be examined with caution.
Encouraging Hand Hygiene Using Feedback Technology
A study conducted at a New York intensive care unit examined whether positive feedback could improve healthcare workers’ hand hygiene practices. The intervention involved installing sensors in the doorways to patient rooms, as well as cameras that recorded the sinks and hand sanitizers. External video auditors monitored whether the healthcare workers engaged in hand hygiene and electric boards immediately provided feedback to the workers on how their shift was doing (eg: great shift! 92% success), as well as email summaries, and weekly performance reports. The researchers found that compliance didn’t rise after the initial camera monitoring began, but did improve once the feedback was provided. The pre-feedback hygiene rate of 10% rose to 81.6% in the post-feedback period. Although this technology was costly, the researchers believe the investment will be justifiable if future studies support a reduced rate of hospital-acquired infections.
Positive Reinforcement and Social Media Addiction
The popular Netflix documentary, The Social Dilemma, explains how social networking sites take advantage of intermittent positive reinforcement to make their platforms more addictive. To explain how this works in practice, Tristan Harris, a former design ethicist at Google, likens social media to slot machines in Vegas.12 When we get a match on Tinder, a like on Instagram, or see that our favorite Youtuber uploaded a video, we get a hit of dopamine as a reward. Like a slot machine, these rewards are rare but frequent enough to keep us hooked. By obtaining a positive reinforcer relatively randomly, we are encouraged to engage with these applications constantly.
Related TDL Content
While positive reinforcement involves adding a stimulus to increase a behavior, negative reinforcement involves removing a stimulus to increase a behavior. These two concepts often co-occur in real-life instances of operant conditioning. Read this TDL article to learn more about the sister term of positive reinforcement and its applications to addiction.
- Poling, A., Carr, J. E., & LeBlanc, L. A. (2002). Operant conditioning. Encyclopedia of Psychotherapy, 271–287. https://doi.org/10.1016/b0-12-343010-0/00154-9
- Quickel, E. J. (2020). Operant conditioning. Encyclopedia of Personality and Individual Differences, 3340–3342. https://doi.org/10.1007/978-3-319-24612-3_987
- Dictionary.com. (n.d.). Behaviorism. Dictionary.com. https://www.dictionary.com/browse/behaviorism.
- Encyclopedia Britannica, inc. (n.d.). Edward L. thorndike. Encyclopedia Britannica. https://www.britannica.com/biography/Edward-L-Thorndike#ref253227.
- Chance, P. (2014). Learning and Behavior (7th ed.). Cengage Learning.
- Page, S., & Neuringer, A. (1985). Variability is an Operant. Journal of Experimental Psychology: Animal Behavior Processes, 11(3), 429–452. https://doi.org/10.1037/0097-7403.11.3.429
- Encyclopedia Britannica, inc. (n.d.). B.F. Skinner. Encyclopedia Britannica. https://www.britannica.com/biography/B-F-Skinner.
- Boundless. (n.d.). Boundless psychology. Lumen. https://courses.lumenlearning.com/boundless-psychology/chapter/operant-conditioning/.
- Scott, T. M., & Landrum, T. J. (2020). An evidence-based logic for the use of positive reinforcement: Responses to typical criticisms. Beyond Behavior, 29(2), 69–77. https://doi.org/10.1177/1074295620917153
- Armellino, D., Hussain, E., Schilling, M. E., Senicola, W., Eichorn, A., Dlugacz, Y., & Farber, B. F. (2011). Using high-technology to enforce low-technology safety measures: The use of third-party remote video auditing and real-time feedback in healthcare. Clinical Infectious Diseases, 54(1), 1–7. https://doi.org/10.1093/cid/cir773
- Marciano, J. (2020, September 15). How social Media hacks our psychology. BetterMarketing. https://bettermarketing.pub/how-social-media-hacks-our-psychology-9f901f55e54a.