mlfinlab.data_generation.corrgan

Implementation of sampling realistic financial correlation matrices from “CorrGAN: Sampling Realistic Financial Correlation Matrices using Generative Adversarial Networks” by Gautier Marti. https://arxiv.org/pdf/1910.09504.pdf

Module Contents

Functions

sample_from_corrgan(model_loc[, dim, n_samples])

Samples correlation matrices from the pre-trained CorrGAN network.

sample_from_corrgan(model_loc, dim=10, n_samples=1)

Samples correlation matrices from the pre-trained CorrGAN network.

It is reproduced with modifications from the following paper: Marti, G., 2020, May. CorrGAN: Sampling Realistic Financial Correlation Matrices Using Generative Adversarial Networks. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8459-8463). IEEE.

It loads the appropriate CorrGAN model for the required dimension. Generates a matrix output from this network. Symmetries this matrix and finds the nearest correlation matrix that is positive semi-definite. Finally, it maximizes the sum of the similarities between adjacent leaves to arrange it with hierarchical clustering.

The CorrGAN network was trained on the correlation profiles of the S&P 500 stocks. Therefore the output retains these properties. In addition, the final output retains the following 6 stylized facts:

  1. Distribution of pairwise correlations is significantly shifted to the positive.

2. Eigenvalues follow the Marchenko-Pastur distribution, but for a very large first eigenvalue (the market).

3. Eigenvalues follow the Marchenko-Pastur distribution, but for a couple of other large eigenvalues (industries).

  1. Perron-Frobenius property (first eigenvector has positive entries).

  2. Hierarchical structure of correlations.

  3. Scale-free property of the corresponding Minimum Spanning Tree (MST).

Parameters:
  • model_loc – (str) Location of folder containing CorrGAN models.

  • dim – (int) Dimension of correlation matrix to sample. In the range [2, 200].

  • n_samples – (int) Number of samples to generate.

Returns:

(np.array) Sampled correlation matrices of shape (n_samples, dim, dim).