Machine Learning Techniques Applied to US Indexes Returns Forecasting

Author: Rodrigo Silva Cosme
Degree: M.S. in Financial Engineering
Year: 2016
Advisory Committee: Dr. Dragos Bozdog, Dr. Khaldoun Khashanah, Dr. David Starer, Dr. Ionut Florescu

Abstract: In this thesis we test the impact of dimensionality reduction techniques on trading strategies based on machine learning. The dimensionality reduction is either performed by Feature Selection alone or by combining Feature Selection and Feature Extraction. The Feature Selection algorithms studied are F-Test, Granger Causality, and Randomized Lasso and the Feature Extraction analyzed are Principal Component Analysis (PCA) and Factor Analysis (FA). The tests are conducted on three american indexes. The dataset is composed by 216 features as technical indicators, world indexes, US economy indicators, foreign exchanges and US and foreign bonds for the period between 2000 and 2016. The tests are conducted on a rolling window using 1000 days for training and 100 days for out of sample testing.

The time series forecasting methods applied are Support Vector Machine (SVR) and Autoregressive Neural Networks with exogenous inputs (NARX). SVR without any dimensionality technique outperformed buy and hold portfolio benchmark. As the dimensionality of the systems was reduced, the performance of SVR decreased with only 50% of the reduced models outperforming the benchmarks. The opposite pattern was observed in the NARX models. It did benefit from Feature Selection and at some level Feature Extraction. Sharpe, returns and percentage of profitable trades increased when NARX was combined with Feature Selection. The set of features selected by Randomized Lasso showed to be more stable than the ones selected by F-Test and Granger Causality. Its performance was superior on both tested learning algorithms.

The tests were conducted on a computational GPU cluster by implementing specialized machine learning python packages such as scikit-learn and Theano.