FE529 GPU Computing in Finance (Deep Reinforcement Learning: Introduction and Experiments)
Campus | Fall | Spring | Summer |
---|---|---|---|
On Campus | X | ||
Web Campus | X |
Course Description
Since the 1960s, applying dynamic programing in finance has been one of the goals of study. Through the history of artificial intelligence, implementing theory to practice often proves to be the biggest obstacle. The recent success of AI is made possible by engineers contributing to open-source implementations. Artificial Intelligence has become a fundamental part for building applications in Finance. In this course, we will overview some of the basic topics in the field of reinforcement learning and its most recent extension, deep reinforcement learning. Besides the theories, our students can surely use more help and guidance on the engineering difficulties of building applications. This course emphasizes the experiments that will be performed using the Hanlon Financial Systems Labs computing infrastructure with initial setups ready to use. The course contains hands-on experience implementing deep reinforcement learning algorithms, with some applications in the financial industry. Applications include, for example, Dynamic Asset-Allocation and Consumption, Derivatives Pricing and Hedging, Optimal Trade Order Execution and Optimal Market Making.
Note: Pre-request of FE522 C++ Programing in Finance to be removed. Please send a pre-reuqest override in system. We use Python for all the coding.
Course Objectives and Course Outcomes
- This course prepares students with packages of code used to solve example machine learning tasks. Understanding the implementation of existing algorithms helps students obtain a thorough understanding of the algorithms.
- With the foundation set in this course, the students will have the confidence to delve deeper into unsolved problems related to financial applications.
- They would get experience coding top-of-the-line CPU and GPU workstations in the Hanlon Financial Systems Labs.
- Students will understand our experiments and improve the code.
- The lab course can potentially accompany any course in Deep Learning. However, for FE and FA it is quite unique as we do not have any hands-on lab in AI.
After completing the course, students will be able to:
- Understand basic deep reinforcement learning algorithms.
- Apply deep reinforcement learning in practice.
- Analyze and evaluate relevant packages developed by the Hanlon Labs researchers.
- Create solutions in challenges in practical financial application projects.
Course Logistics
The main assignments of this course are the implementations of the deep reinforcement learning methods in Python. The experiments we do in this course may require a Linux environment. The software environment will be prepared in the workstations in the Hanlon Financial Systems Lab on the 4th floor of the Babbio Center. Students will have time to test the provided codes during the in-class experiment sessions. Since most students will need more time before and after class to further understand the code and solve problems, it is recommended that students come to the lab when it is not occupied by other classes. It is also recommended that students set up their own computers when it is possible.
Instructors
Professor | Office | |
---|---|---|
Zheng Xing
|
zxing@stevens.edu | BC430 |
More Information
Textbook
Additional References
[1] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.
[2] Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
[3] Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. PMLR, 2016.
[4] Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015).
[5] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
[6] Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. "Reward is enough." Artificial Intelligence (2021): 103535.
[7] Sutton, Richard S., Michael H. Bowling, and Patrick M. Pilarski. “The Alberta Plan for AI Research.” arXiv preprint arXiv:2208.11173 (2022).
[8] Merton, Robert C. “Lifetime portfolio selection under uncertainty: The continuous-time case.” The review of Economics and Statistics (1969): 247-257.
[9] Buehler, Hans, Lukas Gonon, Josef Teichmann, Ben Wood, Baranidharan Mohan, and Jonathan Kochems. "Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning. SSRN Scholarly Paper ID 3355706." Social Science Research Network, Rochester, NY (2019).
[10] Nevmyvaka, Yuriy, Yi Feng, and Michael Kearns. "Reinforcement learning for optimized trade execution." In Proceedings of the 23rd international conference on Machine learning, pp. 673-680. 2006.
[11] Ganesh, Sumitra, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, and Manuela Veloso. "Reinforcement learning for market making in a multi-agent dealer market." arXiv preprint arXiv:1911.05892 (2019).
[12] Avellaneda, Marco, and Sasha Stoikov. "High-frequency trading in a limit order book." Quantitative Finance 8, no. 3 (2008): 217-224.
[13] Rao, Ashwin, and Tikhon Jelvis. Foundations of Reinforcement Learning with Applications in Finance. CRC Press, 2022.
Course Requirements
Attendance: Attendance will be taken every week for both in-person and virtual students.
Participation: Participation will not be recorded for grading purposes, but extra points can be given for outstanding in-class participation.
Homework: The readings are required to be finished before the lecture. The assignments are coding homework that need to be submitted before the day of the in-class experiment session. This is to make sure students will investigate the codes before the class. The in-class experiment session will be used to discuss the details and show some solutions.
Project(s): The midterm presentation and final presentation are graded based on the performance of the learning methods implemented by the students and presentation performance of the students. Students are expected to use slides to talk about the details of their implementations.
Tentative Course Schedule
Topic | Readings | Assignments | |
---|---|---|---|
Week 1 | Introduction and Setup | Textbook Chapter 1, 2, 3, 4 | |
Week 2 | MDP, Dynamic Programming | [1] “Human-level control through deep reinforcement learning.” Nature 518, no.7540 (2015): 529-533. | |
Week 3 | Value Based Methods, Q-learning. | Textbook Chapter 5, 6, 7 | Learning to Play Video Games with DQN |
Week 4 | Policy Gradient Methods | Textbook Chapter 9, 10, 11 [2] “Continuous control with deep reinforcement learning.” arXiv preprint arXiv:1509.02971 (2015). | Learning to Control Dynamic Systems with Policy Gradient |
Week 5 | Actor-critic Methods. | [8] Merton, Robert C. “Lifetime portfolio selection under uncertainty: The continuous-time case.” | Learning with DDPG |
Week 6 | Application: Dynamic Asset- Allocation and Consumption | Textbook Chapter 12, 13 [9] "Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning” | Deep Portfolio Optimization |
Week 7 | Application: Derivatives Pricing and Hedging | [3] “Asynchronous methods for deep reinforcement learning.” In International conference on machine learning, pp. 1928-1937. PMLR, 2016. | Deep Hedging |
Week 8 | Midterm Presentation | Learning with A2C | |
Week 9 | Model Based, Dyna Architecture. | Textbook Chapter 8 [10] "Reinforcement learning for optimized trade execution” | |
Week 10 | Application: Optimal Trade Order Execution | [4] “Prioritized experience replay.” arXiv preprint arXiv:1511.05952 (2015). | Deep Trade Order Execution |
Week 11 | Prioritized Experience Replay | [5] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347 (2017). | DQN with PER |
Week 12 | Proximal Policy Optimization | [11] "Reinforcement learning for market making in a multi- agent dealer market." | Learning with PPO |
Week 13 | Application: Optimal Market Making | [6] Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. “Reward is enough.” Artificial Intelligence (2021): 103535. | |
Week 14 | Hierarchical Deep Reinforcement Learning and Final Presentation |