Markov Decision Processes (MDPs) form a rich class of mathematical models for sequential decision problems under uncertainty and provide a rigorous foundation for Reinforcement Learning (RL). The first part of this course will combine techniques from optimization and stochastics to build a modeling, theoretical, and algorithmic foundation for MDPs. Topics such as finite- and infinite-horizon MDPs; Bellman?s equations of dynamic programming; value iteration, policy iteration, and linear programming-based solution algorithms; partially observable MDPs; robust MDPs; stochastic games; continuous-time MDPs; semi-Markov decision processes; and continuous-time deterministic control will be covered. The second part of the course will build on this foundation to introduce fundamental ideas and solution techniques in RL. These will include Monte Carlo Policy Iteration, Q-learning, Temporal-Difference Learning, and Neuro-Dynamic Programming. Prereq: knowledge of optimization and stochastic models at the undergraduate level and familiarity with a computer programming language such as Python.
Gopher Grades is maintained by Social Coding with data from Summer 2017 to Fall 2025 provided by the University in response to a public records request
Not affiliated with the University of Minnesota
Privacy Policy