Recurrent Neural Networks

The Basic Idea

Artificial neural networks give computers the ability to solve complex problems and make intelligent decisions in a way that very loosely resembles how our human brains work. These networks are key to the advanced deep learning capabilities that are revolutionizing fields like language processing and data forecasting, but one type specifically excels in this area.

Recurrent neural networks (RNNs) are a type of artificial neural network designed to process sequential data by using an internal memory to recall previous information.

What does this mean, exactly? This type of neural network is ideal for processing data that occurs in a specific order, such as words, sentences, or information organized by time intervals like financial data or weather information.1 A key characteristic of RRNs is their ability to remember previous information when processing current information, using a built-in memory to make future predictions.

RNNs achieve this through the use of a hidden state, which serves as a memory bank that retains information from previous data points, or time steps, in a sequence of data. At each time step, the RNN modifies its hidden state to blend the current input with previous information, then generates an output which is carried forward to the next time step, and so on.

This unique ability sets RNNs apart from traditional neural networks, called feedforward neural networks (FNNs). FNNs do not have a hidden state. They process data in only one direction — from input to output — without cycling back over previous information.2 This makes them better for tasks where the order or context of the data is irrelevant. In handwriting recognition, for example, FNNs only need to identify the independent features of each character and not the sequence of strokes.

On the other hand, RNNs have a loop that allows information to be passed along as the system processes data.2 In this way, the models are self-looping or recurrent. This is essential for generating words or sentences — where the order and context of the data do matter.

Imagine you’re telling your friend a story. For the end of the story to make sense, your friend has to remember important details from earlier parts of the story. Your friend may even be able to predict the end of your story based on what you’ve told them so far. An RNN works just like this, remembering the information it has received and using this information to understand and predict what’s coming next.

This makes RNNs well-suited for natural language processing (NLP), natural language generation (NLG), speech recognition, machine translation, autocomplete, and predictive text applications. For example, Google Translate uses RNNs to translate text and Apple’s Siri uses RNNs to recognize and generate speech.3

While RNNs do not fully mimic the complexity of our brains, they are great at identifying patterns, understanding context, and remembering sequences of events — just like us. This suggests that RNNs may be able to provide insight into our own cognitive processes. RNNs can even model human decision-making processes to help us learn about our behavior.

Beyond this, RNNs power many of the technologies you interact with daily and continue to find a place in emerging systems like self-driving cars, smart home devices, fraud detection, credit scoring, customer behavior analysis, healthcare research, and much more. We’ll discuss some of these interesting applications below and address the ever-present challenge of ensuring these tools are used ethically and responsibly.

“There must be a trick to the train of thought, a recursive formula. A group of neurons starts working automatically, sometimes without external impulse. It is a kind of iterative process with a growing pattern. It wanders about in the brain, and the way it happens must depend on the memory of similar patterns.”


Stanislaw M. Ulam, Adventures of a Mathematician

Theory, meet practice

TDL is an applied research consultancy. In our work, we leverage the insights of diverse fields—from psychology and economics to machine learning and behavioral data science—to sculpt targeted solutions to nuanced problems.

Our consulting services

Key Terms

Artificial Neural Networks (ANNs): Machine learning models somewhat inspired by the structure of the human brain. These models are made up of interconnected nodes called artificial neurons — much like the neurons in our brains — that transmit information between each other. These artificial neurons are arranged in layers: an input layer, hidden layers that process data, and an output layer. This structure allows computers to learn how to process data based on the input and desired output, very loosely mimicking how our brains learn from experience.

Deep Learning: An advanced type of machine learning based on artificial neural networks. These networks are made up of multiple hidden layers for processing data, giving them the ability to learn autonomously and make decisions much like a human.

Sequential Data: Data organized in a sequence where the order matters. In sequential data, certain data points are dependent on other data points, such as letters in a word or words in a sentence. RNNs can understand and remember the relationships between these data points to predict what comes next.

Feedforward Neural Networks (FNNs): A type of artificial neural network designed to process information in only one direction. Unlike RNNs, FNNs have no memory for previous inputs when processing current inputs. As a result, they cannot understand data sequences where context matters, such as sentences or time-series data. For example, when receiving a sentence as input, FNNs would treat each word in the sentence independently and would not be able to understand its meaning.

Long Short-Term Memory (LSTM): These are networks designed to extend the memory capacity of RNNs. LSTMs essentially assign “weight” to information, allowing the system to hold onto important information for longer while forgetting less important information.3 This allows RRMs to understand context over longer sequences, making them ideal for tasks like language generation.

Transformer: A type of neural network architecture that can understand context and relationships in sequential data more efficiently than RNNs. Unlike RNNs, which process data in a sequence, transformers process entire sequences simultaneously. They use what’s called “attention mechanisms” to look at the whole sequence at once and pick out important data points. This allows transformers to remember context from much longer sequences and understand more complex language than RNNs.

Natural Language Processing (NLP): The subfield of computer science concerned with giving computers the ability to understand and generate human language. RNNs are often used for NLP tasks, allowing language models to understand context and relationships between words.

History

The idea of artificial neural networks was first proposed in 1944 by University of Chicago researchers Warren McCullough and Walter Pitts.4 In the decades to follow, neural networks were a focus of research in the fields of neuroscience and computer science. 

The first major development of modern RNNs occurred in the 1980s, inspired by our understanding of the human brain’s ability to process sequential information. Researchers David Rumelhart, Geoffrey Hinton, and Ronald Williams were among the first to describe networks with internal memory states.5
However, early RNNs struggled with some significant challenges during training. They had a hard time learning relationships between data points separated by large distances in data sequences. Basic RNNs — often called vanilla RNNs — have trouble remembering information across long sequences of data.6 This limitation led to the development of long short-term memory (LSTM) architecture in 1997.

As mentioned earlier, LSTMs allow RNNs to understand long-range dependencies between data points, improving accuracy in applications like speech recognition, machine translation, and language modeling. This development, along with advancements in computational power, led to significant advancements in the NLP field and improved the performance of various AI applications through the 2000s.6

While LSTMs are a vast improvement over regular RNNs, they can still struggle with very large datasets — they require significant computational power to process vast amounts of data in sequence. Transformer architecture was introduced in 2017 to address the limitations of sequential processing. Due to its ability to process data in parallel, transformer architecture is often preferred over RNNs for training natural language processing applications like ChatGPT.

Consequences

Despite the growing preference towards transformer architecture, RNNs, especially LSTMs, are still used today. RNNS are particularly useful for processing time series data where they must understand the sequence of events over time.2 LSTMs are also superior for real-time processing, like speech recognition, because they can process sequential data one step at a time and do not require the whole sequence to be available at once.

RNNs are also used for many tasks that require systems to make accurate predictions based on previous data. This is valuable for a broad range of data analysis and forecasting applications. For example, businesses frequently rely on RRN capabilities to generate reports, manage inventory, and predict customer service needs. Governments could even use RNNs to predict energy demand or create spending forecasts to help with budgeting and resource allocation.8

Of course, all this still requires a great deal of human oversight. The accuracy and fairness of RRNs depend heavily on their training data, which can be plagued by human inaccuracies and biases. If there are biases in the training data, these can exist in the RNN models as well.

Human oversight is required to avoid issues with bias, especially as RNNs are used to help us make decisions in high-risk fields like finance and healthcare. Imagine how biases in these decision-making areas could lead to harmful consequences. The last thing we want is RNNs propagating systemic discrimination when deciding on insurance coverage eligibility, allocating healthcare resources, or assessing loan applications.

The Human Element

In a way, RNNs mimic how we humans process data. Is this useful for understanding our own cognition? Can RNNs model our decision-making processes and help explain our behavior? Researchers seem to think so. Evidence suggests that deep learning models such as RNNs can be trained to predict human decisions.9 In fact, they perform better at this task than reward-based models that assume humans will always choose the most rewarding option. This suggests that RNNs can help us understand how humans make decisions based on a process of pattern recognition.

RNNs could even help us diagnose mental disorders or predict our emotional reactions to everyday events. One fascinating study explored the use of RNNs to analyze social media posts to identify users with possible mental health issues.10 The researchers suggest that this application could help us understand the impact of social media on mental health and introduce measures to reduce related harms. 
Another study explored RRNs’ ability to predict how users will react to interactions with others on social media platforms.11 They found that RNNs could calculate someone’s emotional state by recognizing patterns in their social media conversations. The application here? RNN tools could help people write comments that positively impact others. Empathetic language suggestions could be particularly useful for people offering online peer-to-peer mental health support.

Controversies

Challenges with Large Datasets

While RNNs have several valuable applications, they do face challenges, especially when processing large datasets. We’ve already talked extensively about how RNNs have trouble remembering past information in long sequences. As sequences get longer, information can get lost along the way — just as you might have trouble remembering the details of a conversion that took place a long time ago.

This is called the vanishing gradient problem.1 RNNs learn through a process of backpropagation, which is essentially a system of feedback, or gradients, that tells the system how close its prediction was to the target. Gradients direct the system to adjust its parameters to reduce errors, but these gradients can become very small as the feedback travels backwards through the network. This makes it difficult for RNNs to learn from steps early in the sequence.

RNNs can also be slow and inefficient when processing training data. RNNs take a long time to train because they process information in a sequence, so they’re not well-suited for large datasets — such as those used to train large language models like ChatGPT and Gemini.2 

Transformer architecture offers an ideal solution to these challenges. Because they process data in parallel, transformers can be trained much more efficiently. Transformers also pay attention to relationships between data points no matter how distant they are, improving their predictive accuracy.

Privacy Risks

RNNs receive training data that they use to learn patterns and context for processing future inputs. Depending on where this data comes from, this can present significant privacy risks. Machine learning models are frequently trained on personal data that could include personal details about people’s health, finances, demographics, location, and other sensitive topics — often unbeknown to the people at risk. Even when this training data is secured, sharing the outputs of these models can cause information to leak unintentionally.

This is a particular problem for RNNs because they have a memory for previous information, unlike feedforward neural networks. As expected, researchers have found that RNNs are more susceptible to privacy risks than FNNs.12 Experts are currently exploring potential methods for protecting sensitive data when training RNNs, but more research is needed to develop reliable solutions. Ensuring data is handled ethically will require strict adherence to privacy regulations and a commitment to transparency and informed consent.

Existing privacy protection initiatives, such as the GDPR (General Data Protection Regulation) and HIPAA (Health Insurance Portability and Accountability Act) oversee how personal information is used, often serving as the first line of protection against the risks associated with new technologies. However, these privacy initiatives lack specific guidance for regulating AI.13 The recent EU AI Act attempts to fill in these regulatory gaps and may become a benchmark for governing AI in other jurisdictions around the world.

Related TDL Content

Machine Learning

Machine learning is a branch of AI focused on teaching computers to learn through experience and improve their performance over time. RNNs are a type of machine learning process called deep learning — you can think of deep learning as a more advanced or evolved version of machine learning. This article explores machine learning in-depth and provides several useful analogies that can help you wrap your mind around how it all works.

Artificial Intelligence Models

RNNs are just one type of artificial intelligence model. AI models comprise all the programs, tools, and frameworks that simulate human intelligence. These models share a fascinating history and have several interesting applications in modern tools. This article deep-dives into these topics and explores the ethical controversies associated with AI models.

References

  1. What is RNN? - Recurrent Neural Networks Explained. (n.d.). AWS. Retrieved May 23, 2024, from https://aws.amazon.com/what-is/recurrent-neural-network/
  2. Why Recurrent Neural Networks (RNNs) Dominate Sequential Data Analysis. (n.d.). Shelf.io. Retrieved May 23, 2024, from https://shelf.io/blog/recurrent-neural-networks/ 
  3. Donges, N., & Urwin, M. (2024, Feb 28). What Are Recurrent Neural Networks (RNNs)? Built In. Retrieved May 23, 2024, from https://builtin.com/data-science/recurrent-neural-networks-and-lstm 
  4. Hardesty, L. (2017, April 14). Explained: Neural networks. MIT News. Retrieved May 23, 2024, from https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414 
  5. Rumelhart, D., Hinton, G. & Williams, R. (1986) Learning representations by back-propagating errors. Nature 323, 533–536 . https://doi.org/10.1038/323533a0 
  6. Yanhui, C. (2021, March 8). A Battle Against Amnesia: A Brief History and Introduction of Recurrent Neural Networks. Towards Data Science. Retrieved May 23, 2024, from https://towardsdatascience.com/a-battle-against-amnesia-a-brief-history-and-introduction-of-recurrent-neural-networks-50496aae6740 
  7. Hochreiter, S. & Schmidhuber, J. (1997) Long Short-Term Memory. Neural Comput, 9(8): 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 
  8. Yang C-H, Molefyane T, Lin Y-D. (2023) The Forecasting of a Leading Country’s Government Expenditure Using a Recurrent Neural Network with a Gated Recurrent Unit. Mathematics, 11(14): 3085. https://doi.org/10.3390/math11143085 
  9. Fintz, M., Osadchy, M. & Hertz, U. (2022) Using deep learning to predict human decisions and using cognitive models to explain deep learning models. Sci Rep, 12 (4736). https://doi.org/10.1038/s41598-022-08863-0 
  10. Bouarara, H. A. (2022). Recurrent Neural Network (RNN) to Analyse Mental Behaviour in Social Media. DOI: 10.4018/978-1-6684-6307-9.ch030
  11. Kanaparthi, S. D., Patle, A., & Naik, K. J. (2023). Prediction and detection of emotional tone in online social media mental disorder groups using regression and recurrent neural networks. Multimedia tools and applications, 1-21. https://doi.org/10.1007/s11042-023-15316-x 
  12. Yang, Y., Gohari, P., & Topcu, U. (2022) On the Privacy Risks of Deploying Recurrent Neural Networks in Machine Learning Models. Proceedings on Privacy Enhancing Technologies. 68-84 http://dx.doi.org/10.56553/popets-2023-0005 
  13. Lawton, G. (2024, April 11). AI and GDPR: How is AI being regulated? TechTarget. Retrieved June 7, 2024, from https://www.techtarget.com/searchdatabackup/feature/AI-and-GDPR-How-is-AI-being-regulated

About the Author

Kira Warje

Kira Warje

Kira holds a degree in Psychology with an extended minor in Anthropology. Fascinated by all things human, she has written extensively on cognition and mental health, often leveraging insights about the human mind to craft actionable marketing content for brands. She loves talking about human quirks and motivations, driven by the belief that behavioural science can help us all lead healthier, happier, and more sustainable lives. Occasionally, Kira dabbles in web development and enjoys learning about the synergy between psychology and UX design.

Read Next

Notes illustration

Eager to learn about how behavioral science can help your organization?