In March 2018, Tesla’s second fatal crash involving its autopilot self-steering system happened on highway 101 in Mountain View, California (The Guardian Staff 2018). Collision reports showed that the driver, Apple software engineer Wei Huang, had received both visual and auditory cues from the self-steering system prior to his vehicle crashing into a concrete median, which tragically killed him. Apparently, Huang had 150 meters of the median in view, or five seconds to react and avoid the barrier if he had been paying full attention to the situation at hand.
Although autonomous vehicle systems have saved more lives than shed (Marshall 2017), should we expect more incidents like these to occur during their continued production? What is more, does the fact that accidents still occur in autonomous self-steering systems (which are designed to improve driver safety) necessitate a deeper investigation into the relationship between hazard perception, automated cues, and multi-tasking?
Although they represent an important part of technological advancement, autonomous vehicles still introduce disturbances for drivers, who may otherwise view them as a way to kick back and direct their attention elsewhere. Putting such trust into driver assistance design can introduce drivers to a dangerous amount of risk, instead of making driving easier and safer. According to behavioral science, this increased capacity to multitask behind the wheel may bring further problems for other drivers and road safety in general, as studies show that our cognitive decision-making systems aren’t as sophisticated as we may think.
To mitigate these risks, the autonomous vehicle industry may benefit from these behavioral science insights, and uncover more about the driver’s cognitive architecture and decision-making processes. By understanding when, where, and how drivers most optimally multitask, the industry can help design policies and technological interventions that enhance synchrony in the autonomous transportation realm.
The Cognitive Bottleneck
To understand the risks behind autonomous driving systems, the relevant dynamics of attention processing and multitasking must first be understood. This is where the field of behavioral science comes in. The discipline’s scientific analysis of human decision-making provides the autonomous transportation sector important inferences about how people act when their attention is divided among multiple environmental stimuli, which can often be the case in a vehicle that promotes multi-tasking between things like watching the road and safety cues whilst writing an email or answering a phone call.
Even though our cognitive system often allows us to process multiple components of a task in such circumstances, this is only to a limited extent. It is not until these parallel streams of incoming information must narrow and converge at a central “bottleneck” that we are unable to react to one or both of the sources of information, which in the worst case could include hazards in the road ahead.
Research infers that we aren’t hardwired to do both things at once in all circumstances. Knowles (1963) proposed that the human mind can only perceive as much as their pool of cognitive resources allows, and that we get much worse at reacting to additional stimuli when we near full cognitive capacity. In the framework of driving an autonomous vehicle, fully attending to a secondary task, such as looking at a phone or laptop screen, uses cognitive resources that should otherwise be allocated for the primary task of driving, particularly in critical situations.
A Question of Time?
So why is the human brain less efficient at making decisions when the cognitive bottle neck is busy processing parallel streams of incoming information? One approach asks if it has something to with the time difference in which the different streams are being processed. Welford (1952) pondered the relationship between time and decision-making within the context of a response bottleneck, and sought to understand why he found that people were unable to respond to two discrete stimuli when they were separated by an interval of less than 500ms.
He attributed this brain lag to the “psychological refractory period”, a theoretical segment of time when we’re unable to respond to a second task until we’ve finished responding to the first one. This model theorizes that our brains use what is known as serial processing (where we process one stimuli at a time), which depletes our cognitive resource pool much faster than parallel processing (where we can process multiple stimuli at a time).
Here, multitasking in autonomous tasks hazards the possibility of dangerous accidents, as drivers on their phones or looking elsewhere will not be able to respond to any hazard warnings until they have finished processing the primary task. In many cases, where dangers arise quickly, it may already be too late before the driver’s attention is free to focus on the road ahead once they’ve completed the first task.
However, this model of serial processing disregards multitasking as a possibility, and states that in order to pay attention to a second task, we must always finish attending to the first task. This makes serial processing rather streamlined, so it is entirely inefficient when it comes to multitasking. A model of parallel processing, on the other hand, states that our brains can efficiently process dual tasks – rather than allocating resources in an “all or nothing” way, we can share them under the guise of multiple resources with multiple bottlenecks (Fischer and Plessow 2015).
This alternative model accounts for the extent to which we can actually process multiple tasks in a way that we perceive as being seamlessly simultaneous. For instance, we can listen to the radio while merging lanes or have a conversation with a friend when making a left turn. This type of processing – be it serial or parallel, can only take us so far, though. New insights suggest that it becomes more difficult to switch tasks or multitask when mappings between stimuli and responses are mismatched, or “incongruent.” Parallel processing, which is the type of processing used for multitasking, and serial processing, the type of processing used for task switching, both occur at optimal levels when stimuli and/or responses are congruent with one another.
Solving the Issue of “Crosstalk”
To get an idea of what congruence means in this context, Hommel (1998), demonstrated the consequences of responses being incongruent with one another, referred to as crosstalk effects. His experiment involved a dual-task paradigm, designed to investigate the effect of crosstalk on processing efficiency. The stimuli for the first task were colors of letters, to which people had to respond manually on a keyboard (left arrow for red and right arrow for green). The stimuli for the second task were identities of letters, to which people had to respond verbally (saying “right” for an S or “left” for an H). When the responses were incongruent (left/“right”), then people were a lot slower because of increased crosstalk, but this effect was reversed when responses shared the same conceptual category (left/“left” or right/“right”).
The explanation for this can be reasoned within the parameters of a multiple resource bottleneck model. Parallel processing for tasks that require different resources (e.g. manual and verbal response resources) shows the least amount of crosstalk when there is dimensional overlap, or when responses share the same conceptual category. What this means is that, when driving autonomous vehicles, multitasking difficulties don’t merely arise from the issue of having to process multiple stimuli at once, like changing lanes and having a conversation, but specifically from using the same cognitive representational coding resources (Wickens 2002), such as turning at an intersection and identifying road signs.
Say you’re being driven by your autonomous car but you’re on your phone. It becomes a lot harder to avoid a potential hazard when your responses are not being primed in an appropriate way. The responses you have to your phone are not the same responses you need on the road. This incongruency creates cognitive delay in the form of lowered situational awareness, which can be dangerous for autonomous drivers.
In 2014, Strand et al. observed that fully autonomous drivers experienced double the collisions than drivers of semi-autonomous vehicles. Also, compared to manual drivers, autonomous drivers can take 70% longer to overtake a lead vehicle and 2.5 seconds longer to hit the brakes in reaction to a red traffic light (Radlmayr et al. 2014; Merat and Jamson 2017).
This is especially worrying as many individuals see these cars as futuristic hubs of enhanced productivity. A 2014 survey asked U.S. respondents what kind of activities they would likely engage in while in an autonomous vehicle (Schoettle and Sivak 2014). Only 36% of people admitted that they would not take their eyes off the road. About 10% of respondents said they would read or text and talk with friends and family, up to 6% said they would work or watch movies, and 7% trusted the vehicle enough to sleep en route to their destination. It’s true that an autonomous vehicle could free up driving time and unleash the potential for multitasking. But, how can we be smart and safe about making this multitasking dream a reality?
The Solution: Cognitive Ergonomics
Understanding the mechanics of central bottleneck processing is crucial in the context of human-machine interface (HMI) design. Meaningful thought needs to be placed into the design of complementary HMIs so that drivers can safely focus on secondary tasks. It’s no mystery that things like reading or talking on a cell phone interfere with a driver’s ability to focus and respond to unexpected situations on the road (Levy, Pashler, and Boer 2006).
Fortunately, existing research efforts have already been placed into the understanding of optimal HMI design. Predictive cues, for instance, can be incorporated into HMIs to enhance a driver’s attention to important details. When the car’s sensor system detects and predicts environmental obstacles, explicit cues can aid a driver in recognizing pedestrians, cyclists, automobiles, road signs, and other items.
These cues can be programmed into the car’s software and can span the visual, auditory, or tactile modalities (Broeker et al. 2017). For instance, simulated driving experiments have shown drivers to optimally integrate auditory cues that are presented at the same time and in the same region of space as visual areas of interest, resulting in enhanced attention and driving performance (C. Ho and Spence 2005; Steenken et al. 2014).
Vibrotactile warning cues have also shown to be particularly effective in spatially orienting a driver to a visually relevant obstacle, such as the sudden deceleration of a lead car (C. Ho, Reed, and Spence 2006). Vibrotactile warning cues are especially helpful when paired with a simultaneous auditory cue (C. Ho, Reed, and Spence 2007). Though numerous autonomous car companies already use simple versions of visual, auditory, and tactile warning cues, more can be done to enhance the cognitive ergonomics between driver and car (Beattie, Baillie, and Halvey 2017).
Multisensory integration reduces cognitive workload because rather than taking data from a single cue, the brain uses redundant data from multiple cues and parallel streams of incoming information to provide the most reliable estimate for perceptual discrimination (Ernst 2006). Multisensory integration takes advantage of the brain’s ability to simultaneously process different yet congruently programmed cues, thereby sidestepping the shortcomings of cognitive serial processing.
Best Paired with Cues
Certain types of cues work best when incorporated into augmented reality technologies or HMIs for assisted driving. Drivers of autonomous vehicles prefer few spatially predictive natural sounds as opposed to many omnidirectional abstract sounds (Fagerlonn and Alm 2010; C. Ho and Spence 2005), which is the standard in the autonomous car industry. If a car is tailgating you, it’s much easier to respond to this event if your own car alerts you with a localized cue at the rear rather than anywhere else in space; the dimensional congruency offered by the localized cue takes advantage of the efficiency provided by the brain’s parallel processing, as suggested by the multiple resource model (C. Ho and Spence 2005; D. C. Ho and Spence 2012).
A driver’s situational awareness can be further enhanced by incorporating auditory cues that mimic the natural sounds of external events, such as crashes, as opposed to simple beeps (van der Heiden, Iqbal, and Janssen 2017). Cues can be especially helpful if drivers are asked to voluntarily translate symbols, such as road signs, to real meanings in their mind. This form of endogenous cueing, which involves controlled and voluntary cognitive processes, can help prime a driver’s attention to predict obstacles and focus on important tasks for a longer period of time (C. Ho, Reed, and Spence 2007; Talsma et al. 2010; Lee, Lee, and Boyle 2009).