Markov Decision Processes
What are Markov Decision Processes?
A Markov decision process (MDP) is a mathematical framework for decision-making in scenarios where outcomes are partly random and partly controlled by a decision-maker. MDPs model choices made over time by weighing various possible actions and states and considering how each action affects future states and potential rewards.
The Basic Idea
To grasp how Markov decision processes work, imagine you are playing a game of chess against a computer. You move your queen to check the computer’s king. The computer now has to decide where to move to avoid losing. In this scenario, the king is considered the agent. Meanwhile, the position of the king when checked in square E8 is known as its current state (S). From this position, the king has the option to take various actions (A)—such as moving up (E7), to its right (D8), or to its left (F8).
The computer evaluates each possible action based on the expected outcomes, seeking to avoid a checkmate and put itself in a favorable position for future moves against you. In other words, the computer must determine the potential reward of each action. After evaluating all possible actions, the computer decides to move to D8, making it its new state (S’). Once the move is made and the intended successful outcome is reached, the computer will refine its approach for future actions by associating similar board states with actions that have historically led to better outcomes.1
The series of steps that the computer took is known as a Markov decision process (MDP). In an MDP, a computer uses a mathematical model to evaluate an agent’s current state (checked at E8), the environment of the system (the game of chess), possible actions (move to E7, D8, or F8), and the rewards of all these potential new states. Markov decision processes are focused only on the current state of the agent, not historical states. For example, the computer would not consider where the king was before E8, as MDPs assume that the current state of an agent holds all relevant information about its previous state.2
Markov decision processes are useful in systems where there are a variety of choices available in uncertain environments. Computers can be trained to automate decision-making in a wide range of dynamic settings to maximize rewards. For instance, if you were attending a conference and your company wanted to minimize travel costs, an MDP could help them determine the most optimal route. A fisherman could use an MDP to estimate how many salmon to fish each year to maximize profit but ensure long-term yield. Urban planners can use the decision-making process to decide the optimal duration of a red light at an intersection to ensure safety and avoid long wait times.3
In short, Markov decision processes are applied in various sectors to solve complex problems by breaking them down into management states. Whether used in computer science, resource management, or urban planning, MDPs offer a structured way to navigate uncertainties and maximize positive rewards.
About the Author
Emilie Rose Jones
Emilie currently works in Marketing & Communications for a non-profit organization based in Toronto, Ontario. She completed her Masters of English Literature at UBC in 2021, where she focused on Indigenous and Canadian Literature. Emilie has a passion for writing and behavioural psychology and is always looking for opportunities to make knowledge more accessible.