FE529 Deep Reinforcement Learning: Applications in Finance

Campus	Fall	Spring	Summer
On Campus		X
Web Campus		X

Course Description

Since the 1960s, applying dynamic programming in finance has been one of the goals of study. Through the history of artificial intelligence, implementing theory to practice often proves to be the biggest obstacle. The recent success of AI is made possible by engineers contributing to open-source implementations. Artificial Intelligence has become a fundamental part for building applications in Finance. In this course, we will overview some of the basic topics in the field of reinforcement learning and its most recent extension, deep reinforcement learning. Besides the theories, our students can surely use more help and guidance on the engineering difficulties of building applications. This course emphasizes the experiments that will be performed using the Hanlon Financial Systems Labs computing infrastructure with initial setups ready to use. The course contains hands-on experience implementing deep reinforcement learning algorithms, with some applications in the financial industry. Applications include, for example, Dynamic Asset-Allocation and Consumption, Derivatives Pricing and Hedging, Optimal Trade Order Execution and Optimal Market Making.

Prerequest: FA590 (or equivalent) and FE520 (or equivalent)

Course Objectives and Course Outcomes

This course prepares students with packages of code used to solve example machine learning tasks. Understanding the implementation of existing algorithms helps students obtain a thorough understanding of the algorithms.
With the foundation set in this course, the students will have the confidence to delve deeper into unsolved problems related to financial applications.
They would get experience coding top-of-the-line CPU and GPU workstations in the Hanlon Financial Systems Labs.
Students will understand our experiments and improve the code.
The lab course can potentially accompany any course in Deep Learning. However, for FE and FA it is quite unique as we do not have any hands-on lab in AI.

After completing the course, students will be able to:

Understand basic deep reinforcement learning algorithms.
Apply deep reinforcement learning in practice.
Analyze and evaluate relevant packages developed by the Hanlon Labs researchers.
Create solutions in challenges in practical financial application projects.

Instructors

Professor	Email	Office
Zheng Xing	zxing@stevens.edu	BC430

More Information

Textbook

Reinforcement Learning: An Introduction (second edition) by Richard S. Sutton and Andrew G. Barto, ISBN-13: 978-0262039246

Additional References

[1] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.

[2] Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).

[3] Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. PMLR, 2016.

[4] Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015).

[5] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).

[6] Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. "Reward is enough." Artificial Intelligence (2021): 103535.

[7] Sutton, Richard S., Michael H. Bowling, and Patrick M. Pilarski. “The Alberta Plan for AI Research.” arXiv preprint arXiv:2208.11173 (2022).

[8] Merton, Robert C. “Lifetime portfolio selection under uncertainty: The continuous-time case.” The review of Economics and Statistics (1969): 247-257.

[9] Buehler, Hans, Lukas Gonon, Josef Teichmann, Ben Wood, Baranidharan Mohan, and Jonathan Kochems. "Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning. SSRN Scholarly Paper ID 3355706." Social Science Research Network, Rochester, NY (2019).

[10] Nevmyvaka, Yuriy, Yi Feng, and Michael Kearns. "Reinforcement learning for optimized trade execution." In Proceedings of the 23rd international conference on Machine learning, pp. 673-680. 2006.

[11] Ganesh, Sumitra, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, and Manuela Veloso. "Reinforcement learning for market making in a multi-agent dealer market." arXiv preprint arXiv:1911.05892 (2019).

[12] Avellaneda, Marco, and Sasha Stoikov. "High-frequency trading in a limit order book." Quantitative Finance 8, no. 3 (2008): 217-224.

[13] Rao, Ashwin, and Tikhon Jelvis. Foundations of Reinforcement Learning with Applications in Finance. CRC Press, 2022.

[14] Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112, no. 1-2 (1999): 181-211.

Tentative Course Schedule

	Topic(s)	Readings(s)	Optional Reading	HW
Week 1	Introduction and Setup	Textbook 1.1 to 1.5, 3.1 to 3.7 and 4.1 to 4.4.	The rest of Textbook Chapter 1, 2, 3 and 4.
Week 2	MDP, Dynamic Programming	[1] “Human-level control through deep reinforcement learning.” Nature 518, no.7540 (2015): 529-533.		Finish setting up lab computer and personal computer.
Week 3	Value Based Methods, Q-learning	Textbook 5.1 to 5.7, 6.1 to 6.5 and 7.7.	The rest of Textbook Chapter 5, 6 and 7.	Learning to Play Video Games with DQN
Week 4	Policy Gradient Methods	[2] “Continuous control with deep reinforcement learning.” arXiv preprint arXiv:1509.02971 (2015).	Textbook Chapter 9, 10 and 11.	Learning to Control Dynamic Systems with Policy Gradient
Week 5	Actor-critic Methods	[8] Merton, Robert C. “Lifetime portfolio selection under uncertainty: The continuous-time case.”		Learning with DDPG
Week 6	Application: Dynamic Asset-Allocation and Consumption	[9] "Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning”	Textbook Chapter 12 and 13.	Deep Portfolio Optimization
Week 7	Application: Derivatives Pricing and Hedging	[3] “Asynchronous methods for deep reinforcement learning.” In International conference on machine learning, pp. 1928-1937. PMLR, 2016.		Deep Hedging
Week 8	Mid-term Exam			Learning with A2C
Week 9	Model Based, Dyna Architecture.	[10] "Reinforcement learning for optimized trade execution”	Textbook Chapter 8.
Week 10	Application: Optimal Trade Order Execution	[4] “Prioritized experience replay.” arXiv preprint arXiv:1511.05952 (2015).		Deep Trade Order Execution
Week 11	Prioritized Experience Replay	[5] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347 (2017).		DQN with PER
Week 12	Proximal Policy Optimization	[11] "Reinforcement learning for market making in a multi-agent dealer market."		Learning with PPO
Week 13	Application: Optimal Market Making	[6] Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. “Reward is enough.” Artificial Intelligence (2021): 103535.
Week 14	Hierarchical Deep Reinforcement Learning	[14] Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112, no. 1-2 (1999): 181-211.
Final Week	Final Presentation

FE529 Deep Reinforcement Learning: Applications in Finance

Course Description

Course Objectives and Course Outcomes

Textbook

Additional References

Application of Reinforcement Learning in Financial Trading and Execution

How to Enable the Bloomberg Excel Add-In on Lab Computers

Interpreting Machine Learning Models in Empirical Asset Pricing