FE529 GPU Computing in Finance (Deep Reinforcement Learning: Introduction and Experiments)
In this course, we will try to get an overview of the most controversial topic in the field of artificial intelligence, reinforcement learning and its most recent extension, deep reinforcement learning. In addition to the theoretical study, we will emphasize on the experiments that we can do in the Hanlon Financial Systems Labs.
Through the history of artificial intelligence, the difficulty of implementing the ideas has been the very obstacle that put this field into winter for a few times. And the most recent breakthrough of using deep neural networks to approximate functions, once again, proves that the same old ideas are right. Rather than debating without evidence, making it work at first and then explaining why it works seems to be the right strategy to succeed in this field. Therefore, we should put more effort into preparing ourselves with packages of codes that work for us. In the future, with the foundation you built in this course, you can choose to dive deeper into the unsolved problems with confidence.
In this course, we will spend only a small portion of the time explaining the theories that you can read in the textbook. The most of our time will be used to understand our experiments and try to improve the performance of our code. The workstations in the Hanlon Financial Systems Labs make this course possible with their top-of-the-line CPU and GPU. The opportunity to quickly start doing hands-on experiments in the challenging field of deep reinforcement learning also makes this course unique.
In terms of theory, we will try to give an overview of the wide framework of this field. Starting from how the value of being a certain state of a small environment is learned using a table, we will end up talking about how a piece of knowledge in the real world can be modeled and learned using a deep neural network that can fit in the workstation. This wide overview allows students to know more existing promising ideas and see the future of this field. We will discuss the missing parts of this theoretical framework and propose some ideas to make those parts. You should also see the challenges and where the other researchers are in the progress.
Note: Pre-request of FE522 C++ Programing in Finance to be removed. Please send a pre-reuqest override in system. We use Python for all the coding.
- Students will get a clear overview of the theoretical framework of deep reinforcement learning.
- Students will run and understand a package of codes that tests different theoretical parts of deep reinforcement learning.
- Students will try to solve some challenges and improve performance of the codes.
The main assignments of this course are the implementations of the deep reinforcement learning methods in Python. The experiments we do in this course may require a Linux environment. The software environment will be prepared in the workstations in the Hanlon Financial Systems Lab on the 4th floor of the Babbio Center. Students will have time to test the provided codes during the in-class experiment sessions. Since most students will need more time before and after class to further understand the code and solve problems, it is recommended that students come to the lab when it is not occupied by other classes. It is also recommended that students set up their own computers when it is possible.
 Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.
 Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
 Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. PMLR, 2016.
 Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015).
 Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
 Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. "Reward is enough." Artificial Intelligence (2021): 103535.
Attendance: Attendance will be taken every week for both in-person and virtual students.
Participation: Participation will not be recorded for grading purposes, but extra points can be given for outstanding in-class participation.
Homework: The readings are required to be finished before the lecture. The assignments are coding homework that need to be submitted before the day of the in-class experiment session. This is to make sure students will investigate the codes before the class. The in-class experiment session will be used to discuss the details and show some solutions.
Project(s): The midterm presentation and final presentation are graded based on the performance of the learning methods implemented by the students and presentation performance of the students. Students are expected to use slides to talk about the details of their implementations.
- 10% Attendance
- 30% Homework
- 30% Midterm Presentation
- 30% Final Presentation
Late Policy: Coding homework late submission will get 0 points because the solutions will be discussed after the due date.
Tentative Course Schedule
|Week 1||Introduction and Setup||Textbook Chapter 1, 2, 3, 4|
|Week 2||MDP, Dynamic Programming||Textbook Chapter 5, 6, 7|
|Week 3||In-class Experiment||DQN with CartPole|
|Week 4||Value Based Methods, Q-learning||Textbook Chapter 9, 10, 11|
|Week 5||In-class Experiment||DQN with Atari|
|Week 6||Policy Gradient Methods||Textbook Chapter 12|
|Week 7||In-class Experiment||DDPG with Pendulum|
|Week 8||Actor-critic Methods||Textbook Chapter 13|
|Week 9||Midterm Presentation||A2C with CartPole|
|Week 10||Model Based, Dyna Architecture||Textbook Chapter 8|
|Week 11||Prioritized Experience Replay||Improving DQN with PER|
|Week 12||Proximal Policy Optimization||CartPole with PPO|
|Week 13||Experience and Gradient Sharing|
|Week 14||Final Presentation|