# MA661 Dynamic Programming and Reinforcement Learning

Course Catalog Description

# Objective

Instructors

Professor | Office | |
---|---|---|

Darinka Dentcheva | darinka.dentcheva@stevens.edu |

More Information

# Course Outcomes

1. Knowledge of the models listed in the syllabus and the modelling ability to identify
suitable problem formulation for typical applications.

2. Know and understanding of the notions of value function, decision rules, policy, and
dynamic programming equations; knowledge of the different types of decision rules and
policies.

3. Ability to formulate and use the principle of optimality, to analyze the optimality
equations to gain insight of the optimal policy.

4. Familiarity with the methods of backward induction, value and policy iteration, as well
as linear programming techniques for solving the dynamic models discussed in class.

5. Ability to identify the optimal policy for a given problem (as a result of a numerical or
theoretical analysis), its value, and interpretation in the context of problem.

6. Know and understand the concept of partial information and the mathematical models
addressing partially observable dynamical systems.

7. Ability to properly formulate approximate dynamic programming for learning; feature
space equations and model framework.

8. Ability to properly choose a numerical approach to a given model, apply the method
correctly, and be aware of its convergence.

**Prerequisites:**

Basic knowledge of analysis, linear algebra, and probability is necessary. Large
portion of the class will deal with controlled Markov chains, and knowledge of some
fundamental notions and results of the theory of Markov chains is useful.

The students who
need a review of the theory of Markov chains may consult the following textbooks:

1. Çinlar: “Introduction to Stochastic Processes”, Prentice Hall, 1975 (chapters 5 and 6)

2. R. Grimmett and D.R. Stirzaker, “Probability and Random Processes”, Oxford 1992
(chapter 6)

In fact, almost any book on stochastic processes will provide the necessary basic information.

**Textbook :** Lecture notes distributed in class.

**Supplementary material :**

M.L. Puterman, Markov Decision Processes, John Wiley & Sons, 2005

D. Bertsekas, Dynamic Programming and Optimal Control, Vol. I, Athena Scientific; 3rd edition,
2005 and Vol. II (Approximate Dynamic Programming), Athena Scientific; 4th edition, 2012

W. B. Powell, Approximate Dynamic Programming, John Wiley & Sons, 2007

**None of the books is required. The lectures will draw from many sources.**

Grading

# Grading Policies

Seven-eight assignments constituting 60% of the final score.

Class
work 10%

Final Exam 30%

Lecture Outline

Date | Topic | Reading |
---|---|---|

Week 1 | Sequential decision problems. The principle of optimality and the shortest path. | |

Week 2 | Stochastic discrete-time control problems. Controlled Markov chains | |

Week 3 | Dynamic programming equations. Monotonicity properties. | |

Week 4 | Discounted infinite horizon problems. | |

Week 5 | Value and policy iteration methods. Linear programming approach. | |

Week 6 | Undiscounted infinite horizon problems. | |

Week 7 | Optimal stopping. | |

Week 8 | Average Cost Problem. | |

Week 9 | Multiarmed bandit problems. | |

Week 10 | Partially observable Markov decision problems. | |

Week 11 | Examples | |

Week 12 | Introduction to approximate dynamic programming. Feature space Equations | |

Week 13 | Temporal difference methods. Q-learning. | |

Week 14 | Policy gradient method. |