2020 Fall FE690 Machine Learning in Finance
Course Catalog Description
Introduction
Campus | Fall | Spring | Summer |
---|---|---|---|
On Campus | X | ||
Web Campus | X |
Instructors
Professor | Office | |
---|---|---|
Gary R. Engler, Ph.D. | gengler@stevens.edu |
More Information
Course Description
The application of Machine Learning (ML) and Artificial Intelligence (AI) to finance does not just focus around the knowledge of algorithms. While the understanding of the algorithms used is fundamental to the discipline, it is also necessary to understand the tradeoffs of each algorithm, how they scale when used in production, and how to explain the problem, solution, and field with people who are not technically proficient.
This course will focus on more traditional machine learning algorithms (random forests, support vector machines, and conditional random fields) as well as recent developments in deep neural networks focusing primarily on the TensorFlow library. The distinctions between the various types of ML algorithms (supervised, unsupervised, and reinforcement learning) will be developed as well as the relationship between the quality of the data and the complexity of the model. The main project for the course will focus around the development of a AI/ML driven FinTech startup, with the final project being the creation of a working prototype, and the delivery of two separate presentations, the first a pitch deck for your “startup” built around the business case development and the second being a technical presentation around your prototype, with demonstration. Examples and exercises of how to deploy ML models into production will be discussed involving Docker, Kubernetes, and microservices architecture. This course will have a strong self-learning component.
Prerequisites : Strong knowledge of Python 3.6, Python 2.7 is being end-of-life’d at the end of 2019, there is no excuse to continue using it unless it’s in a legacy system, which there won’t be in this class. • Probability and Statistics (graduate level or advanced undergraduate level) • Basic experience with GitHub • Strong desire to learn and push the boundaries of what you know
Datasets: While the datasets are subject to changes, they will broadly follow the proposed structures. Fannie Mae / Freddie Mac Mortgage Datasets: More of the traditional structured data sets, contain information leading up to the 2007 financial crisis and contains information both about the acquisition of the loan as well as the performance
Text Data: Natural Language Processing / Understanding datasets to extract sufficient information from textual data and work towards transforming unstructured data (bodies of text) to structured data (JSON, Relational Database entries, Graph Database) for consumption in models. Stock Data: Time Series data for understanding how models (Hidden Markov Model (HMM), Recurrent Neural Network (RNN), Long/Short Term Memory (LSTM) networks) deal with temporal patterns.
Grading
Grading Policies
The final grade in the class will be determined in the following manner:
- 30% Homeworks
- 30% Midterm
- 30% Final Project
I do not accept late assignments except for University sanctioned reasons. Homework assignments will be in the form of Docker images built to solve the particular problems assigned with supporting code on an affiliated GitHub repo.
PROJECT:
The project will focus on the development of a prototype for an AI/ML driven FinTech startup, with the final presentation comprising of two presentations, one for the business case, discussing the business and financial applications of the technology, and one to discuss the technological prototype. A proof-of-concept should be developed for the project, which is a (not perfectly) working demo of the proposed system to show the technical aspects are capable of functioning. The projects will be done with small teams of no more than three individuals.
Lecture Outline
Topic | Reading | |
---|---|---|
Week 1 | Introduction and course overview, Python versions and packages, Docker, GitHub, Unit Tests, REST API, Data Sources | |
Week 2 | Supervised / Unsupervised / Reinforcement Learning, Classification / Regression / Generative Models | |
Week 3 | ||
Week 4 | Dataset 01: Fannie Mae Dataset | |
Week 5 | Working with structured data, one hot encoding | |
Week 6 | working with categorical variables. | |
Week 7 | Dataset 02: Text Data | |
Week 8 | Converting unstructured data to structured data | |
Week 9 | ||
Week 10 | Dataset 03: Stock Data | |
Week 11 | Feature extraction, applications of regression, | |
Week 12 | determining changes in behavior | |
Week 13 | Floating Week | |
Week 14 | Finals |