K-Means is a popular clustering algorithm designed to group data points into k clusters. In the financial industry, grouping funds or assets can isolate behaviors and define investment universes using any number of performance measures, holdings, or alternative features. Standard K-Means clustering at each time increment creates extremely unstable results due to the effects of random initialization and cluster mislabeling. Robust Rolling K-Means (R2K-Means) is the extension of K-Means to time series allowing investors to dynamically track and group funds in a stable and updateable framework.
Since a learning-based model is only as powerful as the data it trains on, the more stable results of the R2 K-Means (versus the Standard K-Means) make it a better candidate for usage across AI-based applications.
Refer to our technical paper1 – Hirsa, Klinkert, Malhotra, Holmes (2024) for the financial engineering and implementation details.
In the following animations, we demonstrate the methodology on mutual fund data-sets and illustrate its advantages in terms of applicability, and explainability.
Section: Nonlinear Classification
R2K-Means uses a rolling window allowing for clusterings to form nonlinear decision boundaries to better group the natural shapes of the data. The animation below depicts the nonlinear decision boundaries of R2K-Means on 180 small cap mutual funds over time.
Section: Centroid Stability
The two animations below illustrate the stability differences between naive K-Mean and R2K-Means. Notice how the centroid in naive K-Means frequently changes locations. This is primarily attributed to random initialization and cluster mislabeling, two effects which have no relationship with the data itself. The results of R2K-Means are much more stable and purely model the changes in the dataset. This is one of the benefits of using R2K-Means; if the data itself does not change, then the clustering results will also not change.
- Hirsa, Ali, Holmes, Ryan, Klinkert, Federico, and Malhotra, Satyan. “Robust Rolling K-Means (R2K-Means): an Updateable Nonlinear K-Means Clustering Methodology for Financial Time Series”. Working paper. Shortly to be available at SSRN (2024) ↩︎