Researchers
Maris Daly
Xizhao (Amber) Liu
Jacob Zuller
Faculty Advisors
Dr. Khaldoun Khashanah
Abstract
The study analyzes investment-grade corporate bonds using TRACE data and CDS data from 2018-2023, testing various pricing methods, including machine learning models, and developing two trading strategies. It finds success in identifying profitable trades, particularly by using data-driven strategies.
Introduction
The paper discusses the resurgence of interest in fixed income assets due to post-COVID economic shifts. With rising interest rates, investment in fixed income markets has grown. Key industry players, such as Bloomberg and MarketAxess, are advancing data-driven pricing, and electronic trading is becoming more common despite fixed income's complexity compared to equities.
Literature Review
Various studies are referenced to ground the research, covering electronic trading, the use of machine learning in bond pricing, liquidity in bond markets, credit risk assessment, and recovery rates. These studies provide insights into corporate bond market dynamics, default risk, and the predictive capabilities of machine learning.
Data
The dataset includes 750 liquid investment-grade bonds from the LQD ETF, sourced from TRACE and IHS Markit, covering intraday trades and CDS data. Exploratory Data Analysis (EDA) examined pricing and liquidity factors. The data was cleaned, standardized, and feature-engineered to enhance model prediction accuracy.
Methodology
The study employed three methodologies to price bonds and create trading strategies:
- Methodology 1: Applied various machine learning models, including OLS regression, penalized linear regression, ensemble methods (random forests, gradient boosting), and LSTM neural networks. Models were optimized using cross-validation, and their performance was assessed using R-squared, MAE, MSE, and RMSE.
- Methodology 2: Incorporated CDS data into XGBoost to examine its impact on bond pricing. This model used a loss function optimized for MSE and performance metrics similar to Methodology 1.
- Methodology 3: A reduced form model considered bond metrics such as coupon rate, yield, and credit rating to probabilistically assess default risk and price bonds, contrasting with the baseline DCF approach.
Empirical Results - Price Predictions and Trading Strategies
Pricing Models: Random forest and gradient boosting models outperformed OLS in pricing accuracy, though adding CDS data reduced effectiveness due to overfitting.
- Trading Strategy 1: Based on daily bond rebalancing with threshold-based mispricing detection, this strategy simulated realistic costs and showed how models adapted to macroeconomic changes.
- Trading Strategy 2: A monthly strategy leveraged bond and macroeconomic indicators to identify significant mispricings (1-10% thresholds). The strategy demonstrated the effectiveness of a 5% mispricing threshold and showed that long/short trades outperformed long-only trades.
Conclusion
The research highlights the effectiveness of using machine learning for bond pricing and trading strategies, with ensemble models outperforming traditional regression methods. While the addition of CDS data led to overfitting, the study concludes that simpler, well-validated models perform best, underscoring the importance of data-driven decision-making in bond markets.