IE 8571: Advanced Reinforcement Learning and Dynamic Programming

4 Credits

Topics are methods for solving problems in sequential decision making. We will introduce the modeling framework of Markov Decision Processes (MDP), and the classic solution approach of dynamic programming. We will discuss the traditional solution approaches to dynamic programming of value and policy iteration. We will then move onto model free methods of finding optimal policies for MDPs such as Monte Carlo and Temporal Difference methods. We will discuss the extension of these methods to problems with large state spaces where it is necessary to introduce parametric approximations such as deep neural networks. Examples will be drawn from problems in navigation, medicine, game play, and others. We will discuss the convergence proofs for a variety of the algorithms in the so-called 'tabular setting', e.g., policy iteration, value iteration, Q-learning, and Sarsa. Prerequisites: Knowledge of probability, optimization, and linear algebra at the undergraduate level. Knowledge of Markov chains at level of IE 8532 or equivalent. Ability to read and write mathematical proofs.

View on University Catalog

All Instructors

A Average (3.879)Most Common: A (91%)

This total also includes data from semesters with unknown instructors.

11 students

      Contribute on our Github

      Gopher Grades is maintained by Social Coding with data from Summer 2017 to Fall 2023 provided by the Office of Institutional Data and Research

      Privacy Policy