mlfinlab.codependence.information
Implementations of mutual information (I) and variation of information (VI) codependence measures from Cornell lecture slides: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes
Module Contents
Functions
|
Calculates optimal number of bins for discretization based on number of observations |
|
Returns mutual information (MI) between two vectors. |
|
Returns variantion of information (VI) between two vectors. |
- get_optimal_number_of_bins(num_obs: int, corr_coef: float = None) int
-
Calculates optimal number of bins for discretization based on number of observations and correlation coefficient (univariate case).
Algorithms used in this function were originally proposed in the works of Hacine-Gharbi et al. (2012) and Hacine-Gharbi and Ravier (2018). They are described in the Cornell lecture notes: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes (p.26)
- Parameters:
-
-
num_obs – (int) Number of observations.
-
corr_coef – (float) Correlation coefficient, used to estimate the number of bins for univariate case.
-
- Returns:
-
(int) Optimal number of bins.
- get_mutual_info(x: numpy.array, y: numpy.array, n_bins: int = None, normalize: bool = False, estimator: str = 'standard') float
-
Returns mutual information (MI) between two vectors.
This function uses the discretization with the optimal bins algorithm proposed in the works of Hacine-Gharbi et al. (2012) and Hacine-Gharbi and Ravier (2018).
Read Cornell lecture notes for more information about the mutual information: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes.
This function supports multiple ways the mutual information can be estimated:
-
standard
- the standard way of estimation - binning observations according to a given number of bins and applying the MI formula. -
standard_copula
- estimating the copula (as a normalized ranking of the observations) and applying the standard mutual information estimator on it. -
copula_entropy
- estimating the copula (as a normalized ranking of the observations) and calculating its entropy. Then MI estimator = (-1) * copula entropy.
The last two estimators’ implementation is taken from the blog post by Dr. Gautier Marti. Read this blog post for more information about the differences in the estimators: https://gmarti.gitlab.io/qfin/2020/07/01/mutual-information-is-copula-entropy.html
- Parameters:
-
-
x – (np.array) X vector.
-
y – (np.array) Y vector.
-
n_bins – (int) Number of bins for discretization, if None the optimal number will be calculated. (None by default)
-
normalize – (bool) Flag used to normalize the result to [0, 1]. (False by default)
-
estimator – (str) Estimator to be used for calculation. [
standard
,standard_copula
,copula_entropy
] (standard
by default)
-
- Returns:
-
(float) Mutual information score.
-
- variation_of_information_score(x: numpy.array, y: numpy.array, n_bins: int = None, normalize: bool = False) float
-
Returns variantion of information (VI) between two vectors.
This function uses the discretization using optimal bins algorithm proposed in the works of Hacine-Gharbi et al. (2012) and Hacine-Gharbi and Ravier (2018).
Read Cornell lecture notes for more information about the variation of information: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes.
- Parameters:
-
-
x – (np.array) X vector.
-
y – (np.array) Y vector.
-
n_bins – (int) Number of bins for discretization, if None the optimal number will be calculated. (None by default)
-
normalize – (bool) True to normalize the result to [0, 1]. (False by default)
-
- Returns:
-
(float) Variation of information score.