Principal Component Analysis (PCA) is an important methodology to reduce and extract meaningful signals from large data-sets. Financial markets introduce time and non stationarity aspects, where applying standard PCA methods may not give stable results. Our robust rolling PCA (R2-PCA) accommodates the additional aspects and mitigates commonly found obstacles including eigenvector sign flipping, and managing multiple dimensions of the data-set. Since a learning-based model is only as powerful as the data it trains on, the more stable results of the R2-PCA (versus the Standard PCA) make it a better candidate for usage across AI-based applications.
Refer to our technical paper[1] – Hirsa, Klinkert, Malhotra, Holmes (2023) for the financial engineering and implementation details.
In the following animations, we demonstrate the methodology on mutual fund data-sets and illustrate its advantages in terms of applicability, and explainability.
Comparing Standard PCA
The essential premise of the R2-PCA algorithm is to use the cosine similarity measure to compare eigenvectors in the current and previous time increments. With these scores computed, the R2-PCA algorithm will unflip and reorder the eigenvectors. By doing this, the reduced time series output of the R2-PCA algorithm will more accurately preserve the characteristics of the original dataset which will improve performance of learning-models using reduced data as input.
Animation 1[2]: Comparing Standard PCA and R2-PCA
The results of the R2-PCA model can be seen in the animation above using Large Cap and US Aggregate time series datasets. The eigenvector sign flips can be seen in the lower plot with a circle; typically the data will appear to be multiplied by -1 and reflected over the x axis. This can drastically reduce accuracy of reduced data to its original. It can be seen in the R2-PCA results that the sign flipping problem does not arise and the results appear much smoother than its standard PCA counterpart.
Animation 2[3]. Covariance Instability – (results using same simulated data across standard PCA, IPCA and R2-PCA)
Unsupervised learning models sometimes require back-tests using artificial data-sets with desired characteristics to best assess performance. The animations depicted above uses a backtest which generates artificial data and projects it into a cross-like shape. The principal components shown as arrows should track the data as it is rotated by theta each time increment. The higher the theta, the more unstable the dataset becomes. The examples in the animations use a theta of .3. The R2-PCA animation shows the R2-PCA algorithm tracks direction of the data across time while avoiding the eigenvector sign flipping shown in the standard PCA animation.
Contact us for implementations.
Email: info@ask2.ai for questions.
[1] Hirsa, Ali and Klinkert, Federico and Malhotra, Satyan and Holmes, Ryan, Robust Rolling PCA: Managing Time Series and Multiple Dimensions (March 25, 2023). Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4400158
[2] Refer to Section 7.2 of paper
[3] Refer to section 8.2 of paper