Content Analysis
What is Content Analysis?
Content analysis is a research technique used to systematically analyze the content of communication. It involves identifying patterns, themes, or biases within qualitative data such as text, images, audio, or video and works to interpret the meaning of the content and its context.
The Basic Idea
Analyzing research data is hard work, no matter what type of data we’re using. Let’s imagine a group of researchers analyzing people’s choices around plant-based food. You might envision experimental groups and a control group being assigned to different conditions: reading an article about factory farming, reading about the effects of meat production on greenhouse gases, or no article for the control group. Then, they take everyone to lunch and see what they order: the meat or plant-based option? The data from this study would be quantitative; Researchers would use statistical analysis to determine the percentage of people in each group who chose meat versus vegetarian options, and to identify any influencing factors in their results. While this might be what we typically picture when we think of data analysis, that's not what every type of study uses.
Content analysis focuses on qualitative data. In this case, rather than analyzing a binary lunch choice, researchers may bring participants in for a focus group. They would ask open-ended questions about how participants make food decisions, how they feel about what they read, and what other factors may be at play for them. Maybe there’s a third group of researchers, interested in understanding why people switch to vegan diets in the first place. They would turn to the internet (where there are endless youtube videos, reddit threads, and facebook posts from people explaining exactly what they eat and why). For both of these research groups, the data they are working with isn’t numerical. Instead, they’re analyzing an abundance of content.
To make sense of focus group transcripts, social media posts, or any other large qualitative data, researchers use content analysis, which helps us uncover patterns in the data. There are a number of ways we can analyze this content, and even though the data is qualitative, we can quantify some of the results: how often is a certain word used? Are certain phrases more likely to be said together? Are certain groups more likely to reference the same thing?
Alternatively, we can perform a more qualitative analysis, coding the words from the data, grouping the codes into themes, and reflecting on patterns within the themes to understand the meaning or bigger picture behind the data. Regardless of which method we choose, when working with qualitative data, we’ll be doing some sort of content analysis.
Key Terms
- Coding: The process of categorizing and labeling pieces of data to identify themes, patterns, and meanings. By identifying recurring topics that emerge from the data, codes can then be grouped into themes, which are key to understanding the deeper meaning of the content.
- Manifest Content: This refers to the explicit, surface-level elements of the content that are directly observable and measurable. Manifest content is straightforward and involves counting and categorizing visible elements, such as words, phrases, or images. For example, looking at social media posts, one could count the number of posts with #vegan, or analyze the number of keywords like ‘animal,’ ‘climate change,’ and ‘veganism’ that come up in an interview.1
- Latent Content: Latent content refers to the underlying, implicit meanings and themes that are not immediately apparent on the surface. This type of content requires interpretation and understanding of the context to uncover the deeper significance of the communication. In the same analysis, this might involve interpreting the tone or sentiment behind the keywords (e.g., whether the thought is positive, negative, or neutral). This can mean unpacking whether words are used sarcastically or supportively.1
- Intercoder Reliability: This measures the degree of agreement among different coders analyzing the same content. High intercoder reliability would indicate consistent coding across researchers, and is a way of examining potential bias in the analysis process.
- Sampling: In some content analysis, when sourcing data from an incredibly large resource (for example, reddit), not every post can be analyzed. Sampling is the process of selecting a representative subset of the content which ensures that the analysis is manageable and that the findings are generalizable.
- Content Validity: In content analysis, content validity refers to how well the categories, themes, or codes used in the analysis represent all aspects of the phenomena being studied. Achieving high content validity means ensuring that the analysis comprehensively covers the content relevant to the research question and accurately reflects the subject matter.
History
Content analysis has seriously evolved, transitioning from simple quantitative methods to incorporating more complex qualitative techniques. The beginnings of content analysis date back to the 18th century with the study of newspapers and pamphlets to understand public opinion and propaganda, with early researchers using simple counting techniques to analyze word frequency. The first known systematic content analysis was performed by Carl Robert Vilhelm Bjerre in the 19th century, focusing on hymn texts.2
During World War II, this method gained prominence when researchers, including Harold Lasswell, analyzed propaganda to understand its effects on public opinion. Lasswell's work laid the groundwork for content analysis as a systematic research method, emphasizing the importance of studying communication to understand its influence on audiences. In the post-war period, content analysis expanded into journalism, political science, psychology, and marketing. In 1952, Bernard Berelson published Content Analysis in Communication Research, which became a seminal work, formalizing the methodology and its applications.2
Berelson suggested that there are five main purposes of content analysis2:
1. To describe substance characteristics of message content;
2. To describe form characteristics of message content;
3. To make inferences to producers of content;
4. To make inferences to audiences of content;
5. To predict the effects of content on audiences.
Because early content analysis focused heavily on quantitative measures, such as counting word frequencies, themes, and concepts, there was still a world of qualitative research that had been left largely unacknowledged (or at least, not given any scientific credit) until the 1970s. At this point, qualitative content analysis emerged, emphasizing the interpretation of context and underlying meanings in communication. Klaus Krippendorff's work, particularly his book Content Analysis: An Introduction to Its Methodology, introduced more sophisticated and interpretive techniques, blending quantitative rigor with qualitative depth.2
The advent of computers and digital tools in the late 20th century revolutionized content analysis, enabling researchers to handle larger datasets and perform more complex analyses, and software such as NVivo, Atlas.ti, and MAXQDA facilitated both quantitative and qualitative content analysis, allowing for more efficient coding and categorization of textual data.
Nowadays, content analysis has increasingly focused on digital content, leveraging the wide world of the internet and its limitless and rich content. Researchers can now use automated text analysis, machine learning, and natural language processing to analyze vast amounts of data more quickly.
Process
What does the process of content analysis look like in a study? The first step in a research project is to define the research question, meaning identifying the objective that the content analysis aims to address. Next, researchers must choose the content to be analyzed. This could be text, images, audio, or video from various sources such as newspapers, social media, interviews, or advertisements. Then it’s time to develop a coding scheme, creating a set of codes and categories that will be used to analyze the content. This involves defining what each code represents and how it should be applied to the data and then systematically applying the codes in multiple rounds.
Researchers will then analyze the data, which could involve both quantitative counting of codes and/or qualitative interpretation of their meaning. Either way, they’ll examine the coded data to identify patterns, themes, and relationships. This synthesizing of the data is an important step to allow for interpretation of the results, which may involve relating the findings to existing theories, identifying implications for practice, or suggesting areas for further research. Lastly, the process of the content analysis and its findings will be written up, reflected on for limitations, and researchers will identify key areas that can be expanded on in future research endeavors.
Controversies
Content analysis, like any analysis method, is not without room for error. Researchers must navigate various potential biases that can impact the validity and reliability of their findings. Major errors can occur right from the beginning, namely when sampling. The selection of content to analyze can introduce bias if it isn’t representative of the entire population or domain of interest. Sampling decisions can significantly affect the outcomes and generalizability of the analysis.3
As is true in the rest of the research world, researchers using content analysis are sometimes criticized for choosing content that aligns with their interests or hypotheses. Because they can’t interview every person on the planet or study every Facebook post on the internet, researchers may consciously or unconsciously be selecting content that aligns with what they’re hoping to find. This can be partially mitigated through random sampling or stratified sampling techniques to select content that’s more representative.
Once data has been gathered, the coding process brings more opportunity for subjectivity and bias. The process of coding and categorizing data can be influenced by the researcher’s personal beliefs, values, and potential research interests, changing how data is coded and interpreted. In the same vein, interpretive validity can be called into question, with the potential for cultural bias or contextual misunderstandings. Researchers may misinterpret the meaning of content because they don’t share the same background/culture of the creator, or they may be missing important context for when, where, and with whom the content was created. All of this can lead to inaccurate coding and faulty conclusions.
We’ve talked about how content analysis can be both quantitative or qualitative. While quantitative methods can provide valuable data, an overemphasis on counting and frequency can overlook the deeper meanings and complexities of the content. Focusing on surface-level data without exploring underlying themes or contexts can be superficial and lose the richness of qualitative insights. The mixed methods style (combining quantitative and qualitative approaches) or thematic type of content analysis allow researchers to dive further into the true experiences of people and the way they communicate.
Increasingly, AI can be leveraged at the retrieving and coding stages to promote efficiency and efficacy. While some are open to this new age, other researchers reject the concept of fully-automated content analysis. Even in quantitative content analysis, many argue that human ability to understand nuances, metaphors, and sarcasm is crucial for accurate interpretation—skills that many automated processes might miss.3 However, as AI develops and its capability of comprehending more subtle language use improves, there is a lot of potential for AI in content analysis.
In the end, qualitative data can provide many insights that quantitative data can’t, and in addition to the limited generalizability and subjective and potentially biased nature of sampling, coding, and analyzing, there’s still the issue of the immense amount of time and resources required, as well as the ethical implications. In research that involves interacting with people directly, ensuring confidentiality and protecting the privacy of participants can be challenging, and there are many barriers to ensuring informed consent, especially in sensitive research areas. If researchers use data from social media, especially without users' knowledge, it's crucial to consider whether participants would consent to their information being used. When informed consent can't be obtained, how do we determine the appropriate use of their data? How can we ensure their privacy and protection? Content analysis must balance the wealth of available information with respect for individuals' privacy.
Related Content
Machine Learning is increasingly being used in content analysis, allowing us to analyze huge amounts of data more quickly. This technique is a subset of artificial intelligence that uses statistical techniques to enable machines to learn from data and improve over time, loosely based on human learning.
Contextual inquiry is a research method used in user experience (UX) design to understand how people use a product or service in their real-world environment and context. Learn how contextual inquiry relates to, and differs from, content analysis.
Grounded theory is a qualitative research methodology designed to construct theories that are grounded in systematically gathered and analyzed data and, unlike other research methods that start with a hypothesis, grounded theory starts with data collection first and then uses that data to develop a theory. Learn how grounded theory and content analysis are related.
References
- Delve. (n.d.). Manifest content analysis vs. latent content analysis. https://delvetool.com/blog/manifest-content-analysis-latent-content-analysis
- Schreier, M. (2012). Qualitative content analysis in practice (pp. 10-23). SAGE Publications.
- Macnamara, J. (2018). Content analysis. University of Technology Sydney. Media and Communication Research Methods. https://www.researchgate.net/profile/Jim-Macnamara-2/publication/327910121_Content_Analysis/links/5db12fac92851c577eba6c90/Content-Analysis.pdf
About the Author
Annika Steele
Annika completed her Masters at the London School of Economics in an interdisciplinary program combining behavioral science, behavioral economics, social psychology, and sustainability. Professionally, she’s applied data-driven insights in project management, consulting, data analytics, and policy proposal. Passionate about the power of psychology to influence an array of social systems, her research has looked at reproductive health, animal welfare, and perfectionism in female distance runners.