Data Science
What is Data Science?
Data science is a multidisciplinary field that employs advanced analytical techniques, statistical methods, and machine learning algorithms to collect, analyze, and interpret large and diverse datasets. It combines elements of statistics, computer science, and domain expertise to uncover hidden patterns and insights, enabling organizations to make data-driven decisions.
The Basic Idea
Data science is becoming increasingly useful in our technology-driven world. Our widespread use of the internet and social media has led to increased access to billions and billions of pieces of information. But what do we do with all of this information?
By using complex analytical techniques to collect and analyze data, and leveraging artificial intelligence and programming, data science uncovers patterns to help organizations plan, strategize, and make data-driven decisions.
Broadly, Data science takes raw data and summarizes it into a cohesive language for decision-makers, ranging from CEOs of large corporations to government officials. For example, Starbucks uses data from a location-analytics company that reveals where target demographic groups are located and traffic patterns to determine where to open new stores. One way the government uses data science is to identify vulnerable populations, analyzing factors such as poverty, education, and unemployment, to inform where resources should be allocated and targeted interventions.1
Data science is often confused with data analytics, but what sets it apart is its scope: data science is about translating complex data into actionable insights; it uses past data to predict future trends. A data analyst uses data to answer questions like: “What was our monthly sales revenue for the last year?” In contrast, a data scientist uses data to pose the question: “Based on historical data, what will our sales be next quarter?”
The Step-by-Step
A data scientist would normally take the following steps to get from a bunch of numbers to meaningful insights.
1. Data ingestion: The first step involves collecting the data to be analyzed. This is done through numerous methods, both automated and manual. Data can come from structured sources like databases and APIs, as well as unstructured sources like social media, IoT devices, and web scraping. Using ETL (Extract, Transform, Load) tools, data scientists can automate the collection and initial transformation of data.
2. Data management and processing: Data comes in many different formats. Because of this, standardizing the data allows for easier analysis. Data standardization and storage structure are up to the organization’s preferences. Data cleaning is crucial at this stage, involving tasks like removing duplicates, handling missing values, and correcting invalid entries. Tools like SQL, Python (pandas), and data warehousing solutions are commonly used to manage and process data efficiently.
3. Data analysis: Here is where the magic happens. Data scientists dive into exploratory data analysis (EDA) to identify patterns, distributions, and potential biases. This involves statistical analysis and the development of machine learning models. Techniques such as regression analysis, clustering, and classification are applied to extract meaningful insights. Tools like Python (scikit-learn), R, and TensorFlow are essential for conducting these analyses.
4. Communication: The insights from the exploratory analysis are translated into an understandable report. Data visualizations are often provided to support the recommendations given and effectively communicate the findings to stakeholders. The report serves as a guide for businesses’ planning and action.
About the Authors
Samantha Lau
Samantha graduated from the University of Toronto, majoring in psychology and criminology. During her undergraduate degree, she studied how mindfulness meditation impacted human memory which sparked her interest in cognition. Samantha is curious about the way behavioural science impacts design, particularly in the UX field. As she works to make behavioural science more accessible with The Decision Lab, she is preparing to start her Master of Behavioural and Decision Sciences degree at the University of Pennsylvania. In her free time, you can catch her at a concert or in a dance studio.
Emilie Rose Jones
Emilie currently works in Marketing & Communications for a non-profit organization based in Toronto, Ontario. She completed her Masters of English Literature at UBC in 2021, where she focused on Indigenous and Canadian Literature. Emilie has a passion for writing and behavioural psychology and is always looking for opportunities to make knowledge more accessible.