MA661 Dynamic Programming and Reinforcement Learning
Course Catalog Description
1. Knowledge of the models listed in the syllabus and the modelling ability to identify
suitable problem formulation for typical applications.
2. Know and understanding of the notions of value function, decision rules, policy, and dynamic programming equations; knowledge of the different types of decision rules and policies.
3. Ability to formulate and use the principle of optimality, to analyze the optimality equations to gain insight of the optimal policy.
4. Familiarity with the methods of backward induction, value and policy iteration, as well as linear programming techniques for solving the dynamic models discussed in class.
5. Ability to identify the optimal policy for a given problem (as a result of a numerical or theoretical analysis), its value, and interpretation in the context of problem.
6. Know and understand the concept of partial information and the mathematical models addressing partially observable dynamical systems.
7. Ability to properly formulate approximate dynamic programming for learning; feature space equations and model framework.
8. Ability to properly choose a numerical approach to a given model, apply the method correctly, and be aware of its convergence.
Basic knowledge of analysis, linear algebra, and probability is necessary. Large portion of the class will deal with controlled Markov chains, and knowledge of some fundamental notions and results of the theory of Markov chains is useful.
The students who need a review of the theory of Markov chains may consult the following textbooks:
1. Çinlar: “Introduction to Stochastic Processes”, Prentice Hall, 1975 (chapters 5 and 6)
2. R. Grimmett and D.R. Stirzaker, “Probability and Random Processes”, Oxford 1992 (chapter 6)
In fact, almost any book on stochastic processes will provide the necessary basic information.
Textbook : Lecture notes distributed in class.
Supplementary material :
M.L. Puterman, Markov Decision Processes, John Wiley & Sons, 2005
D. Bertsekas, Dynamic Programming and Optimal Control, Vol. I, Athena Scientific; 3rd edition, 2005 and Vol. II (Approximate Dynamic Programming), Athena Scientific; 4th edition, 2012
W. B. Powell, Approximate Dynamic Programming, John Wiley & Sons, 2007
None of the books is required. The lectures will draw from many sources.
Seven-eight assignments constituting 60% of the final score.
Class work 10%
Final Exam 30%
|Week 1||Sequential decision problems. The principle of optimality and the shortest path.|
|Week 2||Stochastic discrete-time control problems. Controlled Markov chains|
|Week 3||Dynamic programming equations. Monotonicity properties.|
|Week 4||Discounted infinite horizon problems.|
|Week 5||Value and policy iteration methods. Linear programming approach.|
|Week 6||Undiscounted infinite horizon problems.|
|Week 7||Optimal stopping.|
|Week 8||Average Cost Problem.|
|Week 9||Multiarmed bandit problems.|
|Week 10||Partially observable Markov decision problems.|
|Week 12||Introduction to approximate dynamic programming. Feature space Equations|
|Week 13||Temporal difference methods. Q-learning.|
|Week 14||Policy gradient method.|