FE529 Deep Reinforcement Learning: Applications in Finance

Campus Fall Spring Summer
On Campus X
Web Campus X

Course Description

Since the 1960s, applying dynamic programing in finance has been one of the goals of study. Through the history of artificial intelligence, implementing theory to practice often proves to be the biggest obstacle. The recent success of AI is made possible by engineers contributing to open-source implementations. Artificial Intelligence has become a fundamental part for building applications in Finance. In this course, we will overview some of the basic topics in the field of reinforcement learning and its most recent extension, deep reinforcement learning. Besides the theories, our students can surely use more help and guidance on the engineering difficulties of building applications. This course emphasizes the experiments that will be performed using the Hanlon Financial Systems Labs computing infrastructure with initial setups ready to use. The course contains hands-on experience implementing deep reinforcement learning algorithms, with some applications in the financial industry. Applications include, for example, Dynamic Asset-Allocation and Consumption, Derivatives Pricing and Hedging, Optimal Trade Order Execution and Optimal Market Making.

Prerequest: FA590 (or equivalent) and FE520 (or equivalent)


Course Objectives and Course Outcomes

  1. This course prepares students with packages of code used to solve example machine learning tasks. Understanding the implementation of existing algorithms helps students obtain a thorough understanding of the algorithms.
  2. With the foundation set in this course, the students will have the confidence to delve deeper into unsolved problems related to financial applications.
  3. They would get experience coding top-of-the-line CPU and GPU workstations in the Hanlon Financial Systems Labs.
  4. Students will understand our experiments and improve the code.
  5. The lab course can potentially accompany any course in Deep Learning. However, for FE and FA it is quite unique as we do not have any hands-on lab in AI.

After completing the course, students will be able to:

  1. Understand basic deep reinforcement learning algorithms.
  2. Apply deep reinforcement learning in practice.
  3. Analyze and evaluate relevant packages developed by the Hanlon Labs researchers.
  4. Create solutions in challenges in practical financial application projects.


Instructors

Professor Email Office
Zheng Xing
zxing@stevens.edu BC430

More Information

Textbook

  • Reinforcement Learning: An Introduction (second edition) by Richard S. Sutton and Andrew G. Barto, ISBN-13: 978-0262039246
  • Additional References

    [1] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.

    [2] Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).

    [3] Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. PMLR, 2016.

    [4] Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015).

    [5] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).

    [6] Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. "Reward is enough." Artificial Intelligence (2021): 103535.

    [7] Sutton, Richard S., Michael H. Bowling, and Patrick M. Pilarski. “The Alberta Plan for AI Research.” arXiv preprint arXiv:2208.11173 (2022).

    [8] Merton, Robert C. “Lifetime portfolio selection under uncertainty: The continuous-time case.” The review of Economics and Statistics (1969): 247-257.

    [9] Buehler, Hans, Lukas Gonon, Josef Teichmann, Ben Wood, Baranidharan Mohan, and Jonathan Kochems. "Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning. SSRN Scholarly Paper ID 3355706." Social Science Research Network, Rochester, NY (2019).

    [10] Nevmyvaka, Yuriy, Yi Feng, and Michael Kearns. "Reinforcement learning for optimized trade execution." In Proceedings of the 23rd international conference on Machine learning, pp. 673-680. 2006.

    [11] Ganesh, Sumitra, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, and Manuela Veloso. "Reinforcement learning for market making in a multi-agent dealer market." arXiv preprint arXiv:1911.05892 (2019).

    [12] Avellaneda, Marco, and Sasha Stoikov. "High-frequency trading in a limit order book." Quantitative Finance 8, no. 3 (2008): 217-224.

    [13] Rao, Ashwin, and Tikhon Jelvis. Foundations of Reinforcement Learning with Applications in Finance. CRC Press, 2022.

    [14] Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112, no. 1-2 (1999): 181-211.


    Tentative Course Schedule

    Topic(s) Readings(s) Optional Reading HW
    Week 1 Introduction and Setup Textbook 1.1 to 1.5, 3.1 to 3.7 and 4.1 to 4.4. The rest of Textbook Chapter 1, 2, 3 and 4.
    Week 2 MDP, Dynamic Programming [1] “Human-level control through deep reinforcement learning.” Nature 518, no.7540 (2015): 529-533. Finish setting up lab computer and personal computer.
    Week 3 Value Based Methods, Q-learning Textbook 5.1 to 5.7, 6.1 to 6.5 and 7.7. The rest of Textbook Chapter 5, 6 and 7. Learning to Play Video Games with DQN
    Week 4 Policy Gradient Methods [2] “Continuous control with deep reinforcement learning.” arXiv preprint arXiv:1509.02971 (2015). Textbook Chapter 9, 10 and 11. Learning to Control Dynamic Systems with Policy Gradient
    Week 5 Actor-critic Methods [8] Merton, Robert C. “Lifetime portfolio selection under uncertainty: The continuous-time case.” Learning with DDPG
    Week 6 Application: Dynamic Asset-Allocation and Consumption [9] "Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning” Textbook Chapter 12 and 13. Deep Portfolio Optimization
    Week 7 Application: Derivatives Pricing and Hedging [3] “Asynchronous methods for deep reinforcement learning.” In International conference on machine learning, pp. 1928-1937. PMLR, 2016. Deep Hedging
    Week 8 Mid-term Exam Learning with A2C
    Week 9 Model Based, Dyna Architecture. [10] "Reinforcement learning for optimized trade execution” Textbook Chapter 8.
    Week 10 Application: Optimal Trade Order Execution [4] “Prioritized experience replay.” arXiv preprint arXiv:1511.05952 (2015). Deep Trade Order Execution
    Week 11 Prioritized Experience Replay [5] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347 (2017). DQN with PER
    Week 12 Proximal Policy Optimization [11] "Reinforcement learning for market making in a multi-agent dealer market." Learning with PPO
    Week 13 Application: Optimal Market Making [6] Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. “Reward is enough.” Artificial Intelligence (2021): 103535.
    Week 14 Hierarchical Deep Reinforcement Learning [14] Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112, no. 1-2 (1999): 181-211.
    Final Week Final Presentation