FE582 Foundations of Financial Data Science
Course Catalog Description
Introduction
Campus | Fall | Spring | Summer |
---|---|---|---|
On Campus | X | X | |
Web Campus | X | X |
Instructors
Professor | Office | |
---|---|---|
Dragos Bozdog![]() |
dbozdog@stevens.edu | Babbio 429A |
More Information
Course Description
This course is the first course for the certificate in Financial Services Analytics. Financial services analytics is the science and technology of creating data-driven decision-making analytics for the financial services industry. This can lead to more effective business operations, enhanced customer services and product offerings, and improved risk analysis and risk management. This course is the key building block in this certificate as good data and the understanding of data is critical to the creation of robust financial services analytics. The financial services analytics certificate has four key areas making up its knowledge base:
- Foundations of Financial Data Science (FE-582)
- Introduction to Knowledge Engineering (FE-590)
- Financial Systems Technology (FE-595)
- Data Visualization Applications (FE-550)
Co-Requisite: FE 513 – Practical Aspects of Database Design
Course Outcomes
After taking this course, the students will be able to:
- Have a working knowledge of the issues of data quality, data storage, data scrubbing, data flows, and data encryption and their potential solutions.
- Understand and design various schemas needed for the representation of financial data.
- Tackle problems dealing with data management issues such as collection, warehousing, preprocessing and querying.
- Will get a primer on database management as well as advantages and disadvantages from the attached lab course FE 513.
- Understand how to write applications using the map-reduce feature of Hadoop clusters.
- Have a working understanding of all the databases available for them through the Hanlon lab.
- Apply the newly acquired data management and database skills to financial data from the capital markets, social media, and the financial services sector.
Course Resources
Textbook
No single textbook covers all the topics. Several references will be used and supplementary notes will be provided whenever appropriate.
Additional References
Charu C. Aggarwal, Data Classification: Algorithms and Applications. CRC Press, 2015. (ISBN: 978-1-4665-8674-1)
Charu C. Aggarwal, Data Mining. Springer, 2015. (ISBN: 978-3-319-14141-8)
Deborah Nolan and Duncan T. Lang, Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving, CRC Press, 2015. (ISBN: 978-1-4822-3481-7)
Norman Matloff, The Art of R Programming, No Starch Press, 2011. (ISBN: 978-1-59327-384-2)
Cathy O’Neil and Rachel Schutt, Data Science, O’Reilly, 2014. (ISBN: 978-1-449-35865-5)
Grading
Grading Policies
- Assignments 60%
- Project 40%
- 50% Final Exam
Lecture Outline
Topic | Reading | |
---|---|---|
Week 1 | Introduction to Financial Data Science
Data Science Process Sample Data Processing The Basic Data Types The Major Building Blocks: A Bird’s Eye View Introduction to R Case Study: Exploratory Data Analysis (NYC Real Estate) |
|
Week 2 | Financial Data Quality Issues and Data Scrubbing.
Data Preparation Feature Extraction and Portability Data Cleaning Data Reduction and Transformation Handling Missing Entries Handling Incorrect and Inconsistent Entries Sampling for Static Data and Data Streams Dimensionality Reduction Intro |
|
Week 3 | Case Study: Data and Web Technologies
Web page retrieval, scrapping, regular expression extraction, basic statistical techniques to identify wrong data entries Linear Model Piecewise linear model |
|
Week 4 | Similarity and Distances
Impact of High Dimensionality Lp-norm. Generalized Minkovski Distance. Contrast Impact of Locally Irrelevant Features. Impact of Different Lp-Norms Match-Based Similarity Computation Impact of Data Distribution. ISOMAP Impact of Local Data Distribution. Similarity on Categorical Data Similarity on Mixed Quantitative and Categorical Data Text Similarity Measures. Time Series Similarity Measures |
|
Week 5 | Classification Methods
Logistic Regression Linear Discriminant Analysis Quadratic Discriminant Analysis, K-NN |
|
Week 6 | Clustering Methods
K-Means Clustering Hierarchical Clustering |
|
Week 7 | Tree-Based Methods. Regression Trees. Tree Pruning.
Using Decision Tree to Trade Stock. Building a Trading Strategy. Handling Time-Dependent Data in Python. The Prediction Models. |
|
Week 8 | Financial Time Series
Using Decision Tree to Trade Stock Building a Trading Strategy Handling Time-Dependent Data in R The Prediction Models |
|
Week 9 | Mining Text Data
Specific Characteristics Document Preparation and Similarity Computation Specialized Clustering Methods for Text Probabilistic Algorithms Co-Clustering Topic Modeling Specialized Classification Methods for Text |
|
Week 10 | Case Study: Using Statistics to Identify Spam | |
Week 11 | Outlier Detection | |
Week 12 | No Class (Thanksgiving Recess). | |
Week 13 | Hadoop. HDFS. MapReduce. Hive. Pig | |
Week 14 | Final Project Presentations |