2020 Fall FE690 Machine Learning in Finance



Course Catalog Description

Introduction

This course will focus on more traditional machine learning algorithms (random forests, support vector machines, and conditional random fields) as well as recent developments in deep neural networks focusing primarily on the TensorFlow library. The distinctions between the various types of ML algorithms (supervised, unsupervised, and reinforcement learning) will be developed as well as the relationship between the quality of the data and complexity of the model.

Campus Fall Spring Summer
On Campus X
Web Campus X

Instructors

Professor Email Office
Gary R. Engler, Ph.D. gengler@stevens.edu

More Information

Course Description

The application of Machine Learning (ML) and Artificial Intelligence (AI) to finance does not just focus around the knowledge of algorithms. While the understanding of the algorithms used is fundamental to the discipline, it is also necessary to understand the tradeoffs of each algorithm, how they scale when used in production, and how to explain the problem, solution, and field with people who are not technically proficient.

This course will focus on more traditional machine learning algorithms (random forests, support vector machines, and conditional random fields) as well as recent developments in deep neural networks focusing primarily on the TensorFlow library. The distinctions between the various types of ML algorithms (supervised, unsupervised, and reinforcement learning) will be developed as well as the relationship between the quality of the data and the complexity of the model. The main project for the course will focus around the development of a AI/ML driven FinTech startup, with the final project being the creation of a working prototype, and the delivery of two separate presentations, the first a pitch deck for your “startup” built around the business case development and the second being a technical presentation around your prototype, with demonstration. Examples and exercises of how to deploy ML models into production will be discussed involving Docker, Kubernetes, and microservices architecture. This course will have a strong self-learning component.

Prerequisites : Strong knowledge of Python 3.6, Python 2.7 is being end-of-life’d at the end of 2019, there is no excuse to continue using it unless it’s in a legacy system, which there won’t be in this class. • Probability and Statistics (graduate level or advanced undergraduate level) • Basic experience with GitHub • Strong desire to learn and push the boundaries of what you know

Datasets: While the datasets are subject to changes, they will broadly follow the proposed structures. Fannie Mae / Freddie Mac Mortgage Datasets: More of the traditional structured data sets, contain information leading up to the 2007 financial crisis and contains information both about the acquisition of the loan as well as the performance

Text Data: Natural Language Processing / Understanding datasets to extract sufficient information from textual data and work towards transforming unstructured data (bodies of text) to structured data (JSON, Relational Database entries, Graph Database) for consumption in models. Stock Data: Time Series data for understanding how models (Hidden Markov Model (HMM), Recurrent Neural Network (RNN), Long/Short Term Memory (LSTM) networks) deal with temporal patterns.


Grading

Grading Policies

The final grade in the class will be determined in the following manner:

  • 30% Homeworks
  • 30% Midterm
  • 30% Final Project

I do not accept late assignments except for University sanctioned reasons. Homework assignments will be in the form of Docker images built to solve the particular problems assigned with supporting code on an affiliated GitHub repo.

PROJECT:

The project will focus on the development of a prototype for an AI/ML driven FinTech startup, with the final presentation comprising of two presentations, one for the business case, discussing the business and financial applications of the technology, and one to discuss the technological prototype. A proof-of-concept should be developed for the project, which is a (not perfectly) working demo of the proposed system to show the technical aspects are capable of functioning. The projects will be done with small teams of no more than three individuals.


Lecture Outline

Topic Reading
Week 1 Introduction and course overview, Python versions and packages, Docker, GitHub, Unit Tests, REST API, Data Sources
Week 2 Supervised / Unsupervised / Reinforcement Learning, Classification / Regression / Generative Models
Week 3
Week 4 Dataset 01: Fannie Mae Dataset
Week 5 Working with structured data, one hot encoding
Week 6 working with categorical variables.
Week 7 Dataset 02: Text Data
Week 8 Converting unstructured data to structured data
Week 9
Week 10 Dataset 03: Stock Data
Week 11 Feature extraction, applications of regression,
Week 12 determining changes in behavior
Week 13 Floating Week
Week 14 Finals