Author: Chun Chen
Degree: M.S. in Financial Engineering
Advisory Committee: Dr. Dragos Bozdog, Dr. Ionut Florescu
Abstract: News articles can be a powerful source for many financial time series predictions. However, extracting useful information from a large amount of unstructured data and representing the features in a rather small dimensionality remain a big challenge for researchers. This paper introduces a modified topic model based on latent Dirichlet allocation (LDA) to extract features from news articles which combines sentiment analysis with text mining, denoted as Modified Financial LDA (M-FinLDA). With some assumptions, we provide the detailed mathematics behind the posterior distributions used in Gibbs Sampling and the log-likelihood used to fix the hyperparameters of the M-FinLDA model. We purpose a basic framework for applying the M-FinLDA in text mining and the features from M-FinLDA can be served as additional input features for any machine learning algorithm to improve the predictions. The experimental results show that the features from the M-FinLDA empirically give better results than the comparative features including topic distributions from the LDA and the FinLDA.