Note
The following implementation and documentation closely follow the work of Dr. Gautier Marti: CorrGAN: Sampling Realistic Financial Correlation Matrices using Generative Adversarial Networks.
Warning
In order to use this module, you should additionally install TensorFlow For more details, visit our MlFinLab installation guide.
CorrGAN
Dr. Gautier Marti proposed a novel approach for sampling realistic financial correlation matrices. He used a generative adversarial network (a GAN, named CorrGAN) to recover most of the known “stylized facts” about empirical correlation matrices based on asset returns.
It was trained on approximately 10,000 empirical correlation matrices estimated on S&P 500 returns sorted by a permutation induced by a hierarchical clustering algorithm.
Gautier Marti found that previous methods for generating realistic correlation matrices were lacking. Other authors found as well that “there is no algorithm available for the generation of reasonably random financial correlation matrices with the Perron-Frobenius property. […] we expect the task of finding such correlation matrices to be highly complex”
The Perron-Frobenius property is one of many “stylized facts” that financial correlation matrices exhibit and it is difficult to reproduce with previous methods.
Being able to generate any number of realistic correlation matrices is a game changer for these reasons.
CorrGAN generates correlation matrices that have many “stylized facts” seen in empirical correlation matrices. The stylized facts CorrGAN recovered are:
-
Distribution of pairwise correlations is significantly shifted to the positive.
-
Eigenvalues follow the Marchenko-Pastur distribution, but for a very large first eigenvalue (the market).
-
Eigenvalues follow the Marchenko-Pastur distribution, but for a couple of other large eigenvalues (industries).
-
Perron-Frobenius property (first eigenvector has positive entries).
Hierarchical structure of correlations.
-
Scale-free property of the corresponding Minimum Spanning Tree (MST).
Implementation
- sample_from_corrgan(model_loc, dim=10, n_samples=1)
-
Samples correlation matrices from the pre-trained CorrGAN network.
It is reproduced with modifications from the following paper: Marti, G., 2020, May. CorrGAN: Sampling Realistic Financial Correlation Matrices Using Generative Adversarial Networks. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8459-8463). IEEE.
It loads the appropriate CorrGAN model for the required dimension. Generates a matrix output from this network. Symmetries this matrix and finds the nearest correlation matrix that is positive semi-definite. Finally, it maximizes the sum of the similarities between adjacent leaves to arrange it with hierarchical clustering.
The CorrGAN network was trained on the correlation profiles of the S&P 500 stocks. Therefore the output retains these properties. In addition, the final output retains the following 6 stylized facts:
-
Distribution of pairwise correlations is significantly shifted to the positive.
2. Eigenvalues follow the Marchenko-Pastur distribution, but for a very large first eigenvalue (the market).
3. Eigenvalues follow the Marchenko-Pastur distribution, but for a couple of other large eigenvalues (industries).
-
Perron-Frobenius property (first eigenvector has positive entries).
-
Hierarchical structure of correlations.
-
Scale-free property of the corresponding Minimum Spanning Tree (MST).
- Parameters:
-
-
model_loc – (str) Location of folder containing CorrGAN models.
-
dim – (int) Dimension of correlation matrix to sample. In the range [2, 200].
-
n_samples – (int) Number of samples to generate.
-
- Returns:
-
(np.array) Sampled correlation matrices of shape (n_samples, dim, dim).
-
Example
Note
Due to the CorrGAN trained models being too large to be included in the mlfinlab package (> 100 MB). We included them as a downloadable package
here. Extract the
corrgan_models
folder and copy its file path into the code where we have specified below.
Note
The higher the dimension, the longer it takes CorrGAN to generate a sample. For more information refer to the research notebook.
import matplotlib.pyplot as plt
from mlfinlab.data_generation.corrgan import sample_from_corrgan
# Sample from CorrGAN has shape (n_samples, dim, dim).
# For this example it corresponds to (4, 100, 100).
# Update the file path below with the location of your 'corrgan_models' folder.
corr_mats = sample_from_corrgan(model_loc="../corrgan_models", dim=10, n_samples=4)
# Plots the correlation matrices generated from CorrGAN in a pseudocolor plot.
plt.figure(figsize=(12, 8))
for i in range(min(4, len(corr_mats))):
plt.subplot(2, 2, i + 1)
plt.pcolormesh(corr_mats[i][:, :], cmap='viridis')
plt.colorbar()
plt.show()
Research Notebook
The following research notebook can be used to better understand the sampled correlation matrices.
-
CorrGAN - Realistic Financial Correlation Matrices