Codependence Matrix
The functions in this part of the module are used to generate dependence and distance matrices using the codependency and distance metrics described previously.
-
Dependence Matrix function is used to compute codependences between elements in a given dataframe of elements using various codependence metrics like Mutual Information, Variation of Information, Distance Correlation, Spearman’s Rho, GPR distance, and GNPR distance.
-
Distance Matrix function can be used to compute a distance matrix from a given codependency matrix using distance metrics like angular, squared angular and absolute angular.
Note
MlFinLab makes use of these functions in the clustered feature importance module also they are used in the PortfolioLab package.
Note
Underlying Literature
The following sources elaborate extensively on the topic:
-
Codependence (Presentation Slides) by Marcos Lopez de Prado.
Implementation
- get_dependence_matrix(df: DataFrame, dependence_method: str, theta: float = 0.5, n_bins: int | None = None, normalize: bool = True, estimator: str = 'standard', target_dependence: str = 'comonotonicity', gaussian_corr: float = 0.7, var_threshold: float = 0.2) DataFrame
-
This function returns a dependence matrix for elements given in the dataframe using the chosen dependence method.
List of supported algorithms to use for generating the dependence matrix:
information_variation
,mutual_information
,distance_correlation
,spearmans_rho
,gpr_distance
,gnpr_distance
,optimal_transport
.- Parameters:
-
-
df – (pd.DataFrame) Features.
-
dependence_method – (str) Algorithm to be use for generating dependence_matrix.
-
theta – (float) Type of information being tested in the GPR and GNPR distances. Falls in range [0, 1]. (0.5 by default)
-
n_bins – (int) Number of bins for discretization in
information_variation
andmutual_information
, if None the optimal number will be calculated. (None by default) -
normalize – (bool) Flag used to normalize the result to [0, 1] in
information_variation
andmutual_information
. (True by default) -
estimator – (str) Estimator to be used for calculation in
mutual_information
. [standard
,standard_copula
,copula_entropy
] (standard
by default) -
target_dependence – (str) Type of target dependence to use in
optimal_transport
. [comonotonicity
,countermonotonicity
,gaussian
,positive_negative
,different_variations
,small_variations
] (comonotonicity
by default) -
gaussian_corr – (float) Correlation coefficient to use when creating
gaussian
andsmall_variations
copulas. [from 0 to 1] (0.7 by default) -
var_threshold – (float) Variation threshold to use for coefficient to use in
small_variations
. Sets the relative area of correlation in a copula. [from 0 to 1] (0.2 by default)
-
- Returns:
-
(pd.DataFrame) Dependence matrix.
- get_distance_matrix(dependence_matrix: DataFrame, metric: str = 'angular') DataFrame
-
Applies distance operator to a dependence matrix.
This allows to turn a correlation matrix into a distance matrix. Distances used are true metrics.
List of supported distance metrics to use for generating the distance matrix:
angular
,squared_angular
, andabsolute_angular
.- Parameters:
-
-
dependence_matrix – (pd.DataFrame) Dataframe to which the distance operator is applied (a matrix).
-
metric – (str) The distance metric used to generating the distance matrix [‘angular’, ‘squared_angular’, ‘absolute_angular’].
-
- Returns:
-
(pd.DataFrame) Distance matrix.
Example
import pandas as pd
# Import MLFinLab tools
from mlfinlab.codependence.codependence_matrix import (
get_dependence_matrix,
get_distance_matrix,
)
# Pull data from example data on Github
url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/stock_prices.csv"
asset_returns = pd.read_csv(url, index_col="Date").pct_change().dropna()
# Calculate distance correlation matrix
distance_corr = get_dependence_matrix(
asset_returns, dependence_method="distance_correlation"
)
# Calculate Pearson correlation matrix
pearson_corr = asset_returns.corr()
# Calculate absolute angular distance from a Pearson correlation matrix
abs_angular_dist = get_distance_matrix(pearson_corr, metric="absolute_angular")