`mlfinlab.codependence.information`

Implementations of mutual information (I) and variation of information (VI) codependence measures from Cornell lecture slides: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes

Module Contents

Functions

`get_optimal_number_of_bins`(→ int)	Calculates optimal number of bins for discretization based on number of observations
`get_mutual_info`(→ float)	Returns mutual information (MI) between two vectors.
`variation_of_information_score`(→ float)	Returns variantion of information (VI) between two vectors.

get_optimal_number_of_bins(num_obs: int, corr_coef: float = None) → int

Calculates optimal number of bins for discretization based on number of observations and correlation coefficient (univariate case).

Algorithms used in this function were originally proposed in the works of Hacine-Gharbi et al. (2012) and Hacine-Gharbi and Ravier (2018). They are described in the Cornell lecture notes: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes (p.26)

Parameters:

num_obs – (int) Number of observations.
corr_coef – (float) Correlation coefficient, used to estimate the number of bins for univariate case.

Returns:

(int) Optimal number of bins.

get_mutual_info(x: numpy.array, y: numpy.array, n_bins: int = None, normalize: bool = False, estimator: str = 'standard') → float

Returns mutual information (MI) between two vectors.

This function uses the discretization with the optimal bins algorithm proposed in the works of Hacine-Gharbi et al. (2012) and Hacine-Gharbi and Ravier (2018).

Read Cornell lecture notes for more information about the mutual information: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes.

This function supports multiple ways the mutual information can be estimated:

standard - the standard way of estimation - binning observations according to a given number of bins and applying the MI formula.
standard_copula - estimating the copula (as a normalized ranking of the observations) and applying the standard mutual information estimator on it.
copula_entropy - estimating the copula (as a normalized ranking of the observations) and calculating its entropy. Then MI estimator = (-1) * copula entropy.

The last two estimators’ implementation is taken from the blog post by Dr. Gautier Marti. Read this blog post for more information about the differences in the estimators: https://gmarti.gitlab.io/qfin/2020/07/01/mutual-information-is-copula-entropy.html

Parameters:

x – (np.array) X vector.
y – (np.array) Y vector.
n_bins – (int) Number of bins for discretization, if None the optimal number will be calculated. (None by default)
normalize – (bool) Flag used to normalize the result to [0, 1]. (False by default)
estimator – (str) Estimator to be used for calculation. [standard, standard_copula, copula_entropy] (standard by default)

Returns:

(float) Mutual information score.

variation_of_information_score(x: numpy.array, y: numpy.array, n_bins: int = None, normalize: bool = False) → float

Returns variantion of information (VI) between two vectors.

This function uses the discretization using optimal bins algorithm proposed in the works of Hacine-Gharbi et al. (2012) and Hacine-Gharbi and Ravier (2018).

Read Cornell lecture notes for more information about the variation of information: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes.

Parameters:

x – (np.array) X vector.
y – (np.array) Y vector.
n_bins – (int) Number of bins for discretization, if None the optimal number will be calculated. (None by default)
normalize – (bool) True to normalize the result to [0, 1]. (False by default)

Returns:

(float) Variation of information score.

mlfinlab.codependence.information

Module Contents

Functions

`mlfinlab.codependence.information`