Codependence Matrix



The functions in this part of the module are used to generate dependence and distance matrices using the codependency and distance metrics described previously.

  1. Dependence Matrix function is used to compute codependences between elements in a given dataframe of elements using various codependence metrics like Mutual Information, Variation of Information, Distance Correlation, Spearman’s Rho, GPR distance, and GNPR distance.

  2. Distance Matrix function can be used to compute a distance matrix from a given codependency matrix using distance metrics like angular, squared angular and absolute angular.

Note

MlFinLab makes use of these functions in the clustered feature importance module also they are used in the PortfolioLab package.

Note

Underlying Literature

The following sources elaborate extensively on the topic:


Implementation

get_dependence_matrix(df: DataFrame, dependence_method: str, theta: float = 0.5, n_bins: int | None = None, normalize: bool = True, estimator: str = 'standard', target_dependence: str = 'comonotonicity', gaussian_corr: float = 0.7, var_threshold: float = 0.2) DataFrame

This function returns a dependence matrix for elements given in the dataframe using the chosen dependence method.

List of supported algorithms to use for generating the dependence matrix: information_variation, mutual_information, distance_correlation, spearmans_rho, gpr_distance, gnpr_distance, optimal_transport.

Parameters:
  • df – (pd.DataFrame) Features.

  • dependence_method – (str) Algorithm to be use for generating dependence_matrix.

  • theta – (float) Type of information being tested in the GPR and GNPR distances. Falls in range [0, 1]. (0.5 by default)

  • n_bins – (int) Number of bins for discretization in information_variation and mutual_information, if None the optimal number will be calculated. (None by default)

  • normalize – (bool) Flag used to normalize the result to [0, 1] in information_variation and mutual_information. (True by default)

  • estimator – (str) Estimator to be used for calculation in mutual_information. [standard, standard_copula, copula_entropy] (standard by default)

  • target_dependence – (str) Type of target dependence to use in optimal_transport. [comonotonicity, countermonotonicity, gaussian, positive_negative, different_variations, small_variations] (comonotonicity by default)

  • gaussian_corr – (float) Correlation coefficient to use when creating gaussian and small_variations copulas. [from 0 to 1] (0.7 by default)

  • var_threshold – (float) Variation threshold to use for coefficient to use in small_variations. Sets the relative area of correlation in a copula. [from 0 to 1] (0.2 by default)

Returns:

(pd.DataFrame) Dependence matrix.

get_distance_matrix(dependence_matrix: DataFrame, metric: str = 'angular') DataFrame

Applies distance operator to a dependence matrix.

This allows to turn a correlation matrix into a distance matrix. Distances used are true metrics.

List of supported distance metrics to use for generating the distance matrix: angular, squared_angular, and absolute_angular.

Parameters:
  • dependence_matrix – (pd.DataFrame) Dataframe to which the distance operator is applied (a matrix).

  • metric – (str) The distance metric used to generating the distance matrix [‘angular’, ‘squared_angular’, ‘absolute_angular’].

Returns:

(pd.DataFrame) Distance matrix.


Example

import pandas as pd

# Import MLFinLab tools
from mlfinlab.codependence.codependence_matrix import (
    get_dependence_matrix,
    get_distance_matrix,
)

# Pull data from example data on Github
url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/stock_prices.csv"
asset_returns = pd.read_csv(url, index_col="Date").pct_change().dropna()

# Calculate distance correlation matrix
distance_corr = get_dependence_matrix(
    asset_returns, dependence_method="distance_correlation"
)

# Calculate Pearson correlation matrix
pearson_corr = asset_returns.corr()

# Calculate absolute angular distance from a Pearson correlation matrix
abs_angular_dist = get_distance_matrix(pearson_corr, metric="absolute_angular")

Presentation Slides

codependence_slides.png

References