MA661 Dynamic Programming and Reinforcement Learning
Course Catalog Description
The main purpose of this course is to present an introduction to dynamic programming as the
most popular methodology for learning and control of dynamic stochastic systems. We discuss
basic models, some theoretical results and numerical methods for these problems. They will be
developed starting from basic models of dynamical systems, through finite-horizon stochastic
problems, to infinite-horizon stochastic models of fully or partially observable systems.
Throughout the class special attention will be paid on the application of dynamic programming
to statistical learning. The class will include introduction to approximate dynamic programming
techniques, which are used in statistical learning, such as tree-based methods for classification,
Bayesian learning, etc. The concepts and methods will be illustrated by various applications
including learning in stochastic networks, engineering, business, and finance.
1. Knowledge of the models listed in the syllabus and the modelling ability to identify
suitable problem formulation for typical applications.
2. Know and understanding of the notions of value function, decision rules, policy, and
dynamic programming equations; knowledge of the different types of decision rules and
3. Ability to formulate and use the principle of optimality, to analyze the optimality
equations to gain insight of the optimal policy.
4. Familiarity with the methods of backward induction, value and policy iteration, as well
as linear programming techniques for solving the dynamic models discussed in class.
5. Ability to identify the optimal policy for a given problem (as a result of a numerical or
theoretical analysis), its value, and interpretation in the context of problem.
6. Know and understand the concept of partial information and the mathematical models
addressing partially observable dynamical systems.
7. Ability to properly formulate approximate dynamic programming for learning; feature
space equations and model framework.
8. Ability to properly choose a numerical approach to a given model, apply the method
correctly, and be aware of its convergence.
Basic knowledge of analysis, linear algebra, and probability is necessary. Large
portion of the class will deal with controlled Markov chains, and knowledge of some
fundamental notions and results of the theory of Markov chains is useful.
The students who
need a review of the theory of Markov chains may consult the following textbooks:
1. Çinlar: “Introduction to Stochastic Processes”, Prentice Hall, 1975 (chapters 5 and 6)
2. R. Grimmett and D.R. Stirzaker, “Probability and Random Processes”, Oxford 1992
In fact, almost any book on stochastic processes will provide the necessary basic information.
Textbook : Lecture notes distributed in class.
Supplementary material :
M.L. Puterman, Markov Decision Processes, John Wiley & Sons, 2005
D. Bertsekas, Dynamic Programming and Optimal Control, Vol. I, Athena Scientific; 3rd edition,
2005 and Vol. II (Approximate Dynamic Programming), Athena Scientific; 4th edition, 2012
W. B. Powell, Approximate Dynamic Programming, John Wiley & Sons, 2007
None of the books is required. The lectures will draw from many sources.
Seven-eight assignments constituting 60% of the final score.
Final Exam 30%
||Sequential decision problems. The principle of optimality and the shortest path.
||Stochastic discrete-time control problems. Controlled Markov chains
| Week 3
||Dynamic programming equations. Monotonicity properties.
| Week 4
||Discounted infinite horizon problems.
| Week 5
||Value and policy iteration methods. Linear programming approach.
| Week 6
||Undiscounted infinite horizon problems.
| Week 7
| Week 8
||Average Cost Problem.
| Week 9
||Multiarmed bandit problems.
| Week 10
||Partially observable Markov decision problems.
| Week 11
| Week 12
||Introduction to approximate dynamic programming. Feature space Equations
| Week 13
||Temporal difference methods. Q-learning.
| Week 14
||Policy gradient method.