Natural Language Generation

The Basic Idea

You may know by now that artificial intelligence (AI) can produce incredibly human-like text and speech. The fascinating technology behind this generative ability is natural language generation (NLG), a software process driven by AI that outputs natural written language or speech from data.1,2

NLG translates complex computer data into a language we can understand. It involves generating coherent, contextually appropriate, and often creative text. Essentially, this allows computers to communicate with us using words rather than numbers and symbols. AI systems that generate text, such as ChatGPT, rely on NLG technology to respond to input in a way that seems human.

But NLG isn’t limited to advanced conversational tools like chatbots. Weather reports are an excellent example of simpler NLG in action. These NLG systems take meteorological data, such as temperature readings and satellite imagery, and translate it into reports. This allows weather apps to generate clear and concise weather forecasts that anyone can understand. NLG is also used in financial reporting, medical reporting, customer service chatbots, sports summaries, accessibility tools, content creation, and many more valuable applications.

The speed and accuracy with which computers can translate data to text is what makes these tools so valuable. AI systems can now produce time-sensitive information much faster than humans, which is very beneficial for industries like journalism and healthcare that rely on rapid data analysis. However, NLG also tends to generate inaccurate information and perpetuate biases present in their training data, so these systems still require human oversight (more on this later).

So, how does NLG work? Watching AI programs like ChatGPT generate human-like text seems almost magical. However, NLG actually relies on several technologies working together to execute a series of steps.2 Let’s go over these steps briefly:

  1. Data Analysis: The system filters data from prompts and databases to identify key topics and create a final product that will be relevant to the user.
  2. Data Understanding: Using machine learning the NLG system identifies patterns in the data and adds context to the information based on its training.
  3. Document Planning: The system decides how exactly to structure and present the output. This step generates an outline for the text.
  4. Sentence Assembly and Creation: Sentences are constructed to summarize the topic.
  5. Grammatical Structuring: The program applies grammatical rules to ensure the output (text) sounds natural to us humans.
  6. Output: Finally, the text is generated and the output is delivered in the desired format (written text or speech).

This process produces text that is amazingly similar to natural language. However, NLG still faces several challenges, often struggling with the nuances of human creativity like humor, sarcasm, or producing truly original ideas. As NLG systems continue to learn from us, we can expect them to get better at capturing the intricacies of our language.

Language is a process of free creation; its laws and principles are fixed, but the manner in which the principles of generation are used is free and infinitely varied. Even the interpretation and use of words involves a process of free creation.


Noam Chomsky, renowned linguist and philosopher

Theory, meet practice

TDL is an applied research consultancy. In our work, we leverage the insights of diverse fields—from psychology and economics to machine learning and behavioral data science—to sculpt targeted solutions to nuanced problems.

Our consulting services

Key Terms

Machine Learning: A branch of AI that allows computers to learn on their own by analyzing data, matching patterns, and making predictions. The goal of machine learning is to develop AI programs that can learn without explicit instructions. NLG relies on machine learning to generate text.

Language Model: A type of machine learning model that’s been trained on text data.3 Language models analyze this data to learn patterns, like how frequently different words appear together. Some language models continue to learn through ongoing interactions with people. Large language models (LLMs) are particularly complex because they’re trained on vast sets of data, which allows them to understand more abstract concepts.

Natural Language Processing (NLP): A technology computers use to understand and create human language. The field of NLP combines expertise from various disciplines including computer science and linguistics. NLG is a subset of NLP that focuses on the language generation component.4

Natural Language Understanding (NLU): A process that translates human language into the language that computers understand. This allows the computer to understand what we are asking of it. process aims to decipher the meaning behind our text or speech, allowing computers to do what we ask of them.4

Recurrent Neural Networks (RNNs): A type of artificial neural network (a deep learning algorithm inspired by the human brain) that can process sequential data, such as natural language. RNNs have a built-in memory that allows these algorithms to remember previous data points in order to make future predictions. In language generation, RNNs allow language models to remember the context of previous words to produce coherent text.

Transformer Architecture: A deep learning model designed to handle sequential data more efficiently than RNNs.5 Transformer models achieve this by applying an “attention mechanism” to simulate how human attention works by assigning levels of importance to specific words in a sentence. This mechanism allows transformer models to find patterns in large amounts of textual data with much greater speed and accuracy. The emergence of transformers has led to recent breakthroughs in the field of NLP and the development of LLMs.

History

We can trace NLG back to the 1950s when the research first began on machine translation.6 The goal of early language generation systems was to translate texts from one language to another, establishing a foundation for natural language processing technology.

One of the first NLG programs, a chatbot called ELIZA, emerged in the mid-1960s.7 This program could simulate conversation by pattern matching and searching for keywords in user input. While ELIZA was limited by pre-set rules and lacked a genuine understanding, the program demonstrated the potential for computers to engage in human-like conversation.

In the following decades, more and more programs would be introduced but they all struggled with the nuances of human language. This should come as no surprise—even humans can struggle with contextual subtleties!

Researchers eventually acknowledged that a strict, rule-based approach wasn’t going to work. Subsequent language models needed the ability to learn and generalize. It was in the 1990s that language processing began to embrace the use of vast datasets to train programs on complex language. As a result, we started seeing commercial applications of NLG around this time.6 For example, systems were developed to translate weather data into clear forecasts for the public or produce financial reports for businesses.

In recent years, researchers developed deep learning techniques that relied on complex neural networks—models inspired by the human brain that allow computers to process and learn from data via interconnected nodes or “neurons.” This significantly advanced NLG capabilities, allowing systems to understand complex patterns in language data and generate coherent and contextually relevant text.6

Consequences

NLG has several valuable use cases in nearly every industry, from finance to travel.8 For one thing, NLG allows businesses to make automated customer interactions more personal and engaging, potentially boosting sales. NLG can also automate many data-processing tasks, like data analysis and document creation, to reduce operational costs and free up employee time for more human-centric projects. 

NLG also has obvious consequences in marketing, allowing businesses to generate blog articles, social media posts, website copy, and ad copy — all highly targeted to specific audience needs. Currently, around 67% of organizations use generative AI to work with human language and produce content.9

For the rest of us, NLG can democratize information, allowing people without specialized knowledge or scientific backgrounds to understand complex information. Any of us can use NLG to understand financial reports, medical findings, and other valuable bits of information without relying on experts to pass insights along.

NLG can also be used to personalize learning, improve information accessibility for people with hearing or visual impairments, and break down language barriers to improve communication across cultures. This is only the tip of the iceberg. There are countless ways NLG can enhance our lives and we’re still uncovering new use cases every day.

The Human Element

The rise of NLG systems raises several interesting questions about the psychological impact of relying on AI-generated text and conversing with language models.

Human Communication and Connection

At first glance, it might seem like NLG systems can help people improve their communication skills. However, research suggests that AI chatbots might actually do more harm than good in this area, particularly for people who struggle with human interaction due to autism or anxiety.10 Interacting with AI tools that can generate human-like text can foster habits that perpetuate social isolation—as people become accustomed to the predictability of these systems, they might become less comfortable with real human interaction.

On the other hand, some experts suggest that AI can strengthen our communication when used to support human interaction (rather than replace it entirely). In one study, researchers found that AI-generated feedback helped improve the use of empathy in conversations that took place on a peer-to-peer support platform.11 This demonstrated that AI systems can help people navigate emotionally charged social situations — think arguments among family members or managers giving unfavorable performance reviews.

Attention Spans and Critical Thinking Skills

Many people also worry that NLG will affect our cognitive abilities like attention and critical thinking. Could we lose our ability to understand complex information and form conclusions if we constantly rely on AI to summarize information for us? For one thing, experts suggest that short-form AI content could shorten our attention spans. Having easy access to highly relevant information might make us less likely to sift through content that challenges our assumptions and beliefs.12 The fear is that our desire for immediately gratifying content will lead us into echo chambers that reinforce our existing beliefs and reduce our opportunities to explore new ideas.

Human Creativity

NLG systems can generate poems, songs, and even entire books, so it’s no surprise that people are worried about these tools diminishing our creative value. For a lot of us, our ability to create is deeply tied to our self-worth, so this does not bode well for our mental well-being. 

Fortunately, experts in this area believe generative AI has the potential to promote human creativity by providing thought-provoking ideas or leading us toward insights we might not have gathered on our own.13 However, this also creates a new risk — if we all rely on generative AI for creativity, we might start thinking too similarly, decreasing our cognitive diversity. Since AI is trained on the content we create, this could easily spiral as our creative output becomes increasingly generic.

Controversies

Misinformation and Bias

One of the most pressing concerns surrounding NLG is the ease and speed with which these systems can spread misinformation. Not only can these systems generate content with the aim of going viral, but they can do so in a mass-produced fashion. One recent study explored this, finding that LLMs can generate seemingly credible misinformation.14

NLG systems can also perpetuate biases present in their training data. For instance, an NLG program trained on data containing gender stereotypes might generate text that reproduces these biases. Researchers are currently exploring ways to control biases in language generation. Again, this will require a collaborative effort from multiple disciplines, including social science, linguistics, computer science, ethics, and law.

Job Displacement

The potential for NLG systems to replace jobs is a heavy topic of debate. Generative AI programs can automate tasks like compiling information into reports, creating text-based content, responding to customer support inquiries, and language translation. However, experts argue that NGL is unlikely to completely replace these jobs—at least not in the near future.7

This is because NLG excels at repetitive tasks but struggles with higher-level creativity. For example, it’s not the best at complex problem-solving or emotional intelligence. Generative AI is also prone to “hallucinations” in which it generates incorrect information.

Privacy and Copyright Issues

Because NLG systems are trained on massive amounts of data, it can be difficult to tell whether data sources contain personal information. At the same time, businesses often use these tools to analyze data—such as personal information about customers — and it’s unclear what happens with this information. When NLG systems are dynamically learning from interactions, could sensitive data be shared with others? 

We have also seen issues with NLG systems being trained on copyrighted content—Sarah Silverman recently sued OpenAI and Meta for including her copyrighted materials in the training data for their AI models.15 Many other authors and publishers have expressed concern about the ability of these systems to generate content that’s very similar to copyrighted material. On the other hand, tech companies argue that limiting their use of content could hurt progress and that “fair use” allows them to use copyrighted material to advance the knowledge of AI models.

Currently, we don’t have clear solutions to any of these data use challenges. It will require significant interdisciplinary collaboration to establish regulations that reduce the risk to personal information and resolve questions about the legality of using copyrighted content in NLG programs.

Related TDL Content

Building a culture of innovation around Generative AI & LLMs

This article explores how organizations can build a culture of innovation around the use of AI by applying insights from behavioral science. It also discusses key challenges facing the use of AI, such as ethics, and how organizations can develop policies to govern these emerging technologies.

The AI Governance of AI

AI systems are shaping our behaviors in some interesting ways. This article examines how we can build accountability and transparency into these tools so that we understand when and how we are being influenced by AI. Will AI be used to govern AI? Check out this article to explore this fascinating question.

References

  1. What is Natural Language Generation (NLG)? (n.d.). Qualtrics. Retrieved May 9, 2024, from https://www.qualtrics.com/experience-management/customer/natural-language-generation/ 
  2. Wigmore, I. (n.d.). What is Natural Language Generation (NLG)? TechTarget. Retrieved May 9, 2024, from https://www.techtarget.com/searchenterpriseai/definition/natural-language-generation-NLG 
  3. Çelik, T. (2022, July 20). What Is a Language Model? Deepset. Retrieved May 9, 2024, from https://www.deepset.ai/blog/what-is-a-language-model 
  4. Kavlakoglu, E. (2020, November 12). NLP vs. NLU vs. NLG: the differences between three natural language processing concepts. IBM. Retrieved May 9, 2024, from https://www.ibm.com/blog/nlp-vs-nlu-vs-nlg-the-differences-between-three-natural-language-processing-concepts/ 
  5. Ferrer, J. (2024, Jan). How Transformers Work: A Detailed Exploration of Transformer Architecture. DataCamp. Retrieved May 22, 2024, from https://www.datacamp.com/tutorial/how-transformers-work 
  6. Shukla, N. (2024, April 12). Evolution of Language Models: From Rules-Based Models to LLMs. Appy Pie. Retrieved May 9, 2024, from https://www.appypie.com/blog/evolution-of-language-models 
  7. Frey, C. B., & Osborne, M. (2023). Generative AI and the Future of Work: A Reappraisal. The Oxford Martin Working Paper Series on the Future of Work, No. 2023. https://oms-www.files.svdcdn.com/production/downloads/academic/2023-FoW-Working-Paper-Generative-AI-and-the-Future-of-Work-A-Reappraisal-combined.pdf 
  8. NLG and Its Business Impact Across Industries. (2023, November 23). Cogent Infotech. Retrieved May 9, 2024, from https://www.cogentinfo.com/resources/nlg-and-its-business-impact-across-industries
  9. Uspenskyi, S. (2024, February 27). Large Language Model Statistics And Numbers (2024). Springs. Retrieved May 9, 2024, from https://springsapps.com/knowledge/large-language-model-statistics-and-numbers-2024 
  10. Franze, A., Galanis, C. R., & King, D. L. (2023). Social chatbot use (e.g., ChatGPT) among individuals with social deficits: Risks and opportunities. Journal of Behavioral Addictions, 12(4), 871-872. https://doi.org/10.1556/2006.2023.00057
  11. Sharma, A., Lin, I.W., Miner, A.S., Atkins, D. C., & Althoff, T. (2023) Human–AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nat Mach Intell 5, 46–57. https://doi.org/10.1038/s42256-022-00593-2
  12. Guide, S. (2023, February 27). AI-Generated Short Content: The Next Big Threat To Our Attention Spans. Sustainable Content Marketing, Thrive In Hong Kong & Asia. Retrieved May 9, 2024, from https://manson.space/blog/ai-short-content-attention-span-threat/ 
  13. Shackell, C. (2023, September 27). Will AI kill our creativity? It could – if we don't start to value and protect the traits that make us human. The Conversation. Retrieved May 9, 2024, from https://theconversation.com/will-ai-kill-our-creativity-it-could-if-we-dont-start-to-value-and-protect-the-traits-that-make-us-human-214149 
  14. Pan, Yikang & Pan, Liangming & Chen, Wenhu & Nakov, Preslav & Kan, Min-Yen & Wang, William. (2023). On the Risk of Misinformation Pollution with Large Language Models. https://www.researchgate.net/publication/370981786_On_the_Risk_of_Misinformation_Pollution_with_Large_Language_Models 
  15. Milmo, D. (2023, July 10). Sarah Silverman sues OpenAI and Meta claiming AI training infringed copyright. The Guardian. Retrieved May 9, 2024, from https://www.theguardian.com/technology/2023/jul/10/sarah-silverman-sues-openai-meta-copyright-infringement

About the Author

Smiling woman with long hair stands in front of a lush plant with pink and yellow flowers, near what appears to be a house exterior with horizontal siding and a staircase.

Kira Warje

Kira holds a degree in Psychology with an extended minor in Anthropology. Fascinated by all things human, she has written extensively on cognition and mental health, often leveraging insights about the human mind to craft actionable marketing content for brands. She loves talking about human quirks and motivations, driven by the belief that behavioural science can help us all lead healthier, happier, and more sustainable lives. Occasionally, Kira dabbles in web development and enjoys learning about the synergy between psychology and UX design.

Read Next

Notes illustration

Eager to learn about how behavioral science can help your organization?