Markov Decision Processes

What are Markov Decision Processes?

A Markov decision process (MDP) is a mathematical framework for decision-making in scenarios where outcomes are partly random and partly controlled by a decision-maker. MDPs model choices made over time by weighing various possible actions and states and considering how each action affects future states and potential rewards.

graphical representation of the MDP model

The Basic Idea

To grasp how Markov decision processes work, imagine you are playing a game of chess against a computer. You move your queen to check the computer’s king. The computer now has to decide where to move to avoid losing. In this scenario, the king is considered the agent. Meanwhile, the position of the king when checked in square E8 is known as its current state (S). From this position, the king has the option to take various actions (A)—such as moving up (E7), to its right (D8), or to its left (F8).

The computer evaluates each possible action based on the expected outcomes, seeking to avoid a checkmate and put itself in a favorable position for future moves against you. In other words, the computer must determine the potential reward of each action. After evaluating all possible actions, the computer decides to move to D8, making it its new state (S’). Once the move is made and the intended successful outcome is reached, the computer will refine its approach for future actions by associating similar board states with actions that have historically led to better outcomes.1

Chess game with numbers & letters, showing King checked at E8, and empty spaces at E7, D8, F8, and arrows to show the different possible actions

The series of steps that the computer took is known as a Markov decision process (MDP). In an MDP, a computer uses a mathematical model to evaluate an agent’s current state (checked at E8), the environment of the system (the game of chess), possible actions (move to E7, D8, or F8), and the rewards of all these potential new states. Markov decision processes are focused only on the current state of the agent, not historical states. For example, the computer would not consider where the king was before E8, as MDPs assume that the current state of an agent holds all relevant information about its previous state.2

Markov decision processes are useful in systems where there are a variety of choices available in uncertain environments. Computers can be trained to automate decision-making in a wide range of dynamic settings to maximize rewards. For instance, if you were attending a conference and your company wanted to minimize travel costs, an MDP could help them determine the most optimal route. A fisherman could use an MDP to estimate how many salmon to fish each year to maximize profit but ensure long-term yield. Urban planners can use the decision-making process to decide the optimal duration of a red light at an intersection to ensure safety and avoid long wait times.3

In short, Markov decision processes are applied in various sectors to solve complex problems by breaking them down into management states. Whether used in computer science, resource management, or urban planning, MDPs offer a structured way to navigate uncertainties and maximize positive rewards. 

The future is independent of the past given the present.


— David Silver, principal research scientist at GoogleDeepMind and lead programmer on the AlphaGo project4

About the Author

Emilie Rose Jones

Emilie Rose Jones

Emilie currently works in Marketing & Communications for a non-profit organization based in Toronto, Ontario. She completed her Masters of English Literature at UBC in 2021, where she focused on Indigenous and Canadian Literature. Emilie has a passion for writing and behavioural psychology and is always looking for opportunities to make knowledge more accessible. 

About us

We are the leading applied research & innovation consultancy

Our insights are leveraged by the most ambitious organizations

Image

I was blown away with their application and translation of behavioral science into practice. They took a very complex ecosystem and created a series of interventions using an innovative mix of the latest research and creative client co-creation. I was so impressed at the final product they created, which was hugely comprehensive despite the large scope of the client being of the world's most far-reaching and best known consumer brands. I'm excited to see what we can create together in the future.

Heather McKee

BEHAVIORAL SCIENTIST

GLOBAL COFFEEHOUSE CHAIN PROJECT

OUR CLIENT SUCCESS

$0M

Annual Revenue Increase

By launching a behavioral science practice at the core of the organization, we helped one of the largest insurers in North America realize $30M increase in annual revenue.

0%

Increase in Monthly Users

By redesigning North America's first national digital platform for mental health, we achieved a 52% lift in monthly users and an 83% improvement on clinical assessment.

0%

Reduction In Design Time

By designing a new process and getting buy-in from the C-Suite team, we helped one of the largest smartphone manufacturers in the world reduce software design time by 75%.

0%

Reduction in Client Drop-Off

By implementing targeted nudges based on proactive interventions, we reduced drop-off rates for 450,000 clients belonging to USA's oldest debt consolidation organizations by 46%

Read Next

Notes illustration

Eager to learn about how behavioral science can help your organization?