BIA656 Statistical Learning and Analytics

Course Catalog Description

Introduction

The significant amount of corporate information available requires a systematic and analytical approach to select the most important information and anticipate major events. Machine learning algorithms facilitate this process understanding, modeling and forecasting the behavior of major corporate variables.

This course introduces statistical and graphical (machine learning) models used for inference and prediction. The emphasis of the course is in the learning capability of the algorithms and their application to several business areas.

Prerequisites:

Basic course in probability and statistics at the level of MGT 620 or BIA 652 Multivariate data analytics.

Campus	Fall	Spring	Summer
On Campus
Web Campus

Instructors

Professor	Email	Office
Germán Creamer	gcreamer@stevens.edu	Babbio 637

More Information

Course Outcomes

By the end of this course, the students will be able to:

Understand the foundations of statistical learning algorithms
Apply statistical models and analytical methods to several business domains using a statistical language.
Recognize the value and also the limits of statistical learning algorithms to solve business problems.

Additional learning objectives include the development of:

Written and oral communications skills: students are required to communicate properly during the class discussions and project class presentations. Homeworks and project report should be presented “as if” they were submitted to a senior manager of a major corporation.
Solve a major analytical problem using large and heterogeneous datasets in a group project and communicate its results in a professional way.

Course Resources

Textbook

Required Text(s):

Foster Provost and Tom Fawcett, Data Science for Business, O’Reilly, 2013. (code to get a discount on oreilly.com: AUTHD)

Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. (Amazon.com sells the paperback version (2013))

Case Pilgrim Bank A (602104), Harvard Business School. You must register in the following website, buy the case and download related documents: https://cb.hbsp.harvard.edu/cbmp/access/28615189

Optional Readings:

Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning. Springer-Verlag, New York, 2010 (selected sections) (downloadable at http://www-stat.stanford.edu/~tibs/ElemStatLearn/).

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to Information Retrieval, Cambridge University Press. 2008 (downloadable at http://nlp.stanford.edu/IR-book).

R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. Tom M. Mitchell, Machine Learning, McGraw-Hill Series in Computer Science, 1997.

Vasant Dhar and Roger Stein. Seven methods for transforming corporate data into business intelligence. Upper Saddle River: Prentice Hall. 1997.

Additional Free Texts:

A. Rajaraman, J. Ullman Mining of Massive Datasets Book (very useful for big data problems)

Mohammed Zaki and Wagner Meira Jr. Mohammed Zaki and Wagner Meira Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms (draft)

StatSoft Electronic Statistics Textbook (statistics and data mining)

Roberto Battiti and Mauro Brunato LIONbook: Learning and Intelligent Optimization (introductory)

Grading

Grading Policies

Assignments:

The course will have a main project and 4 assignments/cases of data analysis. The assignments must be submitted electronically through the course website before the beginning of the class of the assigned day. Each student must submit his/her own report. You should also include the Readme, log and code files if you used a script or wrote a program. E-mail submissions will not be accepted. Each assignment has a value of 5 points.

Project:

The project requires that participants build a decision support system (DSS) based on one of the methods explored in this course. Each project must be developed by groups of three students and they should present a project proposal at the middle of the semester.

Software: Python is the main software packages that will be used. You should participate in the Python bootcamp offered by the school at the beginning of the semester.

Grades

Assignment	Grade %
Assignments/cases	20%
Team project	30%
Participation	10%
Final exam	40%
Total Grade	100%

Lecture Outline

	Topic	Reading
Week 1	Introduction to data science and data analytic thinking	PF, ch. 1 and 2
Week 2	Predictive modeling	PF, ch. 3 B., 1.3, 1.4, 1.5
Week 3	From correlation to supervised segmentation	B, 1.6, 14.4 Optional reference: HTF, ch. 9.2
Week 4	Linear models	PF, ch. 4 B. 3.1, 4.1.1-4.1.3, 4.3.2
Week 5	Support vector machines	B, 6.1, 6.2, 7.1 Optional references: HTF, ch. 12; MRS, ch. 15
Week 6	Model performance analysis	PF, ch. 5, 7 and 8
Week 7	Graphical Models	PF, ch. 9; B. 1.2 Optional references: HTF, ch. 8.3-8.4; MRS, ch. 11, 13
Week 8	Graphical Models	B, Ch. 8
Week 9	Relational learning: Bayesian models	PF, ch. 11 B, 11.1, 11.2, 11.3
Week 10	Application to marketing: Targeting consumers Sequential data (time series): Markov decision processes: -Reinforcement learning -Time series -Application to trading	Case Pilgrim Bank 1st part B, 13.1 http://www1.icsi.berkeley.edu/~moody/MoodySaffellTNN01.pdf
Week 11	Sequential data (time series): Hidden Markov models	B, 13.2
Week 12	Mean variance decomposition Combining models: Ensemble methods	Case Pilgrim Bank 2nd part B, 3.2, 14.2-14.3 PF, ch. 12 Optional references (click on each): ADTrees, Bagging, Random Forests HTF, 8.7, 10.1, 15.1-15.3, 16
Week 13	Combining models. Application to finance: Mixed trading strategies algorithmic trading	B, 14.1, 14.4, 14.5 Creamer, Model calibration…, Quantitative.
Week 14	Final presentations

PF: Provost and Fawcett, Data Science for Business

B: C. Bishop, Pattern Recognition and Machine Learning

Optional readings:
HTF: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning. 2010
MRS: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to Information Retrieval, 2008.

BIA656 Statistical Learning and Analytics

Introduction

Course Outcomes

Textbook

Grading Policies

Application of Reinforcement Learning in Financial Trading and Execution

How to Enable the Bloomberg Excel Add-In on Lab Computers

Interpreting Machine Learning Models in Empirical Asset Pricing