Data Science

The Basic Idea

Although often viewed as just a bunch of numbers on a screen, data science informs what we know, who we know, and how we view the world. Professionals in the field choose the information, entertainment, and media we consume, making up the “back-end” of virtually all technology and social media. As a result of collecting data and understanding who we—the users—are, data scientists are able to curate information that appeals to us, ever so slightly changing or reinforcing our preferences, beliefs, and ideologies. Data scientists influence everything from what we buy to what causes we care about.

Key Terms

Data

Facts and statistics collected together for reference or analysis.

Computer science

The study of the principles and use of computers.

Data mining

The process companies use to turn raw data into useful information.

Machine learning

The use of computer systems that are able to learn or adapt to circumstances independently, by using adaptable algorithms rather than consistent instructions.

Artificial intelligence

A computer or robot’s ability to complete tasks that typically require a human’s judgment or intelligence.

History

Throughout the 1980’s and 90’s, “data mining” was the term used to refer to analyzing raw data without a hypothesis or specific intention. Other terms such as “data fishing,” “information harvesting,” and “knowledge extraction” were also frequently used to describe this process of gathering information out of large databases.

In 2001, computer scientist William S. Cleveland wrote a research paper advocating for statistics to expand beyond theory and into practice. He wanted to combine data mining with computer science, opening up the possibilities for statistics to be a powerful force of innovation. Because this jump would radically change the field of statistics, Cleveland argued that a new name—data science—was in order.

Cleveland was not the first to advocate for this change, however he is most widely recognized for it today. In fact, back in 1985, computer scientist C.F. Jeff Wu had used the term “data science” to replace the term “statistics” during a lecture in Beijing, and had continued to use it throughout his work leading up to the 2000’s.

Throughout the early 2000’s, “data science” became a term more widely used, and began appearing on names of committees and journals, specifically Columbia University’s The Journal of Data Science in 2003. As the internet became more interactive throughout the turn of the millennium, an increase in data on the internet posed a question to computer scientists: what do we do with all this data?

The data boom sparked the need for answers, which came in the form of data science. In the 15-20 years since then, no agreed-upon definition of “data science” has been reached, and professionals are still trying to figure out exactly what this term means. The shift away from the term “statistics,” however, demonstrates data’s introduction to the practical realm. This cultural change demonstrates that statistics are no longer just numbers—now seen as “data,” they can be transformed into insights that can help solve real world problems.

People

 

William S. Cleveland

William S. Cleveland is an American computer scientist and professor. Having completed a PhD in Statistics at Yale University, Cleveland worked in the Statistics Research Department at Bell Labs for over a decade, after which he became a professor at Purdue University. Cleveland’s research interests have spanned computer networking, machine learning, environmental science, and data visualization, among others. In a 2001 publication, Cleveland coined the term “data science” as an amalgamation of data mining and computer science.

C.F. Jeff Wu

C.F. Jeff Wu also earned a PhD in Statistics and has worked for many years as a professor of engineering at the Georgia Institute of Technology. He is known for his work on experimental and algorithmic design. During a lecture in Beijing in 1985, Wu used the term “data science” for the first time as an alternative name for statistics. He then gave a lecture in 1997 entitled “Statistics = Data Science?” Although this was not its formal introduction—which came in 2001—Wu brought the term to the public and was an early advocate that statistics be renamed “data science.”

DJ Patil

DJ Patil is an American mathematician and computer scientist who popularized the term “data scientist” as a professional title. In 2011, Patil wrote the book Building Data Science Teams to describe what being a data scientist means and how to be a successful one. In 2012, he wrote Data JujitsuThe Art of Turning Data Into Product, which focuses on problem-solving in the data science sphere. Patil served as the United States Chief Data Scientist of the Office of Science and Technology Policy from 2015 to 2017, where he led the country’s mission toward democratized public federal data.

Consequences

Nowadays, data is collected constantly at extremely high volumes. Each time you click on a webpage, send an email, or scroll by a targeted advertisement, algorithms are collecting data about your preferences and interests, composing a constantly-updated profile of your identity. These are called psychographic profiles.

As you can imagine, large companies who have millions users receive mass amounts of data, referred to as “big data.” Since roughly 2010, companies have been receiving big data as a result of interactive social media platforms, an innovation that is referred to as Web 2.0.

As a result of this data, all sorts of companies gain insights on who they should target their products or services to—and, more specifically, which products or services, how often, and even, at what times of day. Data scientists analyze and interpret this data through algorithms, and make use of the information by targeting advertisements of their products toward those most likely to engage with or buy them. This is a form of choice architecture.

Data scientists also draw other sorts of insights from this raw data that can help them improve their brand. If, for example, an organization notices that a lot of time is being spent on their website trying to find the “contact us” tab, website or user experience designers may choose to make this tab more easily accessible. In this way, without having to call up the company and “speak to a manager,” your instincts and actions automatically feed data to companies, who will then interpret your concerns and fix their bugs accordingly.

Data scientists can go further than interpreting data: they can also create new solutions to world problems that may come in the form of software or algorithms. These may exist as machine learning, artificial intelligence, or simply new apps or websites.

Today, big data is a vital tool for companies and organizations of all sizes as it has changed what is considered possible when it comes to company outreach, recruitment, marketing, and customer service. In the past five years, data-driven businesses have increased $333 billion in worth, and are now valued at roughly $1.2 trillion.

Put simply, data science is the best form of behavioral science available to computers: it helps nudge humans toward decisions, and then gives humans the opportunity to put the research into practice.

Controversies

Most likely, you yourself have been part of a data science controversy at one point or another—you just may not have known it.

While understanding users is highly beneficials for companies, organizations, and apps, there are many ways in which users are manipulated based on these parties’ goals. Therein lies the controversy at the cornerstone of data science: will it be used for good or evil?

Of course, there are ways data science is used for good: if your technology can nudge you toward a positive decision that you’ve been thinking about making, you may be grateful. If they know that you are a potential candidate who might like to sign a worthy petition, for example, then being able to get this petition to you will ultimately benefit your cause of choice. In this way, data science can help an organization with a goal reach many users, and possibly affect positive real-world change.

However, at the hands of less altruistic designers, data science can have negative effects on our mental health, our decision making, our politics, and even our relationships. Having technology understand us so well can be detrimental when it continues to beg for more and more of our attention. As advertisements, videos, and articles that intrigue us continue to pop up and steal our attention, we are more heavily sucked into our screens and away from our real lives. Teens today on average spend almost 7.5 hours per day on their screens, not including time for schoolwork.

Moreover, having technology understand us so well means that it feeds us information that aligns with our beliefs. Because of this, we are limited in the perspectives we gain and how much we learn of other perspectives, capitalizing on the confirmation bias. The pairing of data science and social media has thus had polarizing effects on our political landscape, as described in Netflix’s The Social Dilemma.

Case Studies

Technology and privacy

Another major controversy around data science has been its privacy implications, which were largely breached in a scandal involving Facebook and political consulting firm Cambridge Analytica. In 2016, Cambridge Analytica used an app called “This Is Your Digital Life” to gain access to the data of 87 million Facebook users without their consent. The consulting firm used the data to assist the political campaigns of Donald Trump and Ted Cruz, by analyzing what type of advertisement or news story might be most likely to make them vote based on their likes and interests. #DeleteFacebook trended on Twitter as people gained interest in how privacy and social media could influence political outcomes. As a result of the scandal, Cambridge Analytica filed for bankruptcy in 2018 and Facebook was fined $5 million. The Netflix documentary The Great Hack describes the scandal in detail.

Health and surveillance

In other ways, however, data can be used for positive change. For example, by collecting data about people’s whereabouts and test results, Canada’s COVID app is able to notify people if they’ve been around someone with COVID-19. These users are then advised to isolate, with the intent that a chain reaction will help slow the spread. In Canada, the app has had minimal actual effect; similar initiatives in Asia, however, have made significant contributions to slowing the pandemic as a result of differing governmental values. Read more here.

Related TDL articles

Algorithms for Simpler Decision-Making

In this article, PhD researcher Jason Burton helps us understand the inevitable relationship between algorithms and humans. While we don’t think of algorithms in personal terms, Burton shows us that algorithms are simply an extension of the human mind, and teaches how we can optimize their pervasive presence for our own benefit.

The Impact of Technology on our Choice Environments

In this episode of our podcast, The Decision Corner, host Brooke Struck sits down with policy researcher and analyst Gianluca Sgueo. Sgueo discusses the relationships between big data and privacy, choice architecture, and democracy in our society, as well as how governments rely on big data to tackle pressing societal issues and attract citizens as users.

Why Decision Science Matters

This perspective piece by K.V. Rao explains how data science helps big businesses make important decisions, and why it is such a quintessential tool for businesses of the future.

Sources

Read Next