Markov Decision Processes

The Basic Idea

To grasp how Markov decision processes work, imagine you are playing a game of chess against a computer. You move your queen to check the computer’s king. The computer now has to decide where to move to avoid losing. In this scenario, the king is considered the agent. Meanwhile, the position of the king when checked in square E8 is known as its current state (S). From this position, the king has the option to take various actions (A)—such as moving up (E7), to its right (D8), or to its left (F8).

The computer evaluates each possible action based on the expected outcomes, seeking to avoid a checkmate and put itself in a favorable position for future moves against you. In other words, the computer must determine the potential reward of each action. After evaluating all possible actions, the computer decides to move to D8, making it its new state (S’). Once the move is made and the intended successful outcome is reached, the computer will refine its approach for future actions by associating similar board states with actions that have historically led to better outcomes.¹

Chess game with numbers & letters, showing King checked at E8, and empty spaces at E7, D8, F8, and arrows to show the different possible actions

The series of steps that the computer took is known as a Markov decision process (MDP). In an MDP, a computer uses a mathematical model to evaluate an agent’s current state (checked at E8), the environment of the system (the game of chess), possible actions (move to E7, D8, or F8), and the rewards of all these potential new states. Markov decision processes are focused only on the current state of the agent, not historical states. For example, the computer would not consider where the king was before E8, as MDPs assume that the current state of an agent holds all relevant information about its previous state.²

Markov decision processes are useful in systems where there are a variety of choices available in uncertain environments. Computers can be trained to automate decision-making in a wide range of dynamic settings to maximize rewards. For instance, if you were attending a conference and your company wanted to minimize travel costs, an MDP could help them determine the most optimal route. A fisherman could use an MDP to estimate how many salmon to fish each year to maximize profit but ensure long-term yield. Urban planners can use the decision-making process to decide the optimal duration of a red light at an intersection to ensure safety and avoid long wait times.³

In short, Markov decision processes are applied in various sectors to solve complex problems by breaking them down into management states. Whether used in computer science, resource management, or urban planning, MDPs offer a structured way to navigate uncertainties and maximize positive rewards.

About the Author

Emilie Rose Jones

Emilie currently works in Marketing & Communications for a non-profit organization based in Toronto, Ontario. She completed her Masters of English Literature at UBC in 2021, where she focused on Indigenous and Canadian Literature. Emilie has a passion for writing and behavioural psychology and is always looking for opportunities to make knowledge more accessible.

Consulting

Industries

Resources

What are Markov Decision Processes?

The Basic Idea

Case studies

From Insight to Impact: Our Success Stories

Is there a problem we can help with?

About the Author

Emilie Rose Jones

About us

We are the leading applied research & innovation consultancy

Our insights are leveraged by the most ambitious organizations

OUR CLIENT SUCCESS

Annual Revenue Increase

Increase in Monthly Users

Reduction In Design Time

Reduction in Client Drop-Off

Read Next

Quantitative Research

Six Thinking Hats

The Hawthorne Effect

Root Cause Analysis

Eager to learn about how behavioral science can help your organization?

Consulting

Industries

Resources

Markov Decision Processes

What are Markov Decision Processes?

The Basic Idea

Case studies

From Insight to Impact: Our Success Stories

Is there a problem we can help with?

About the Author

Emilie Rose Jones

About us

We are the leading applied research & innovation consultancy

Our insights are leveraged by the most ambitious organizations

OUR CLIENT SUCCESS

Annual Revenue Increase

Increase in Monthly Users

Reduction In Design Time

Reduction in Client Drop-Off

Read Next

Quantitative Research

Six Thinking Hats

The Hawthorne Effect

Root Cause Analysis

Eager to learn about how behavioral science can help your organization?

Get new behavioral science insights in your inbox every month.