mlfinlab.codependence.correlation

Correlation based distances and various modifications (angular, absolute, squared) described in Cornell lecture notes: Codependence: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes

Module Contents

Functions

angular_distance(→ float)

Returns angular distance between two vectors. Angular distance is a slight modification of Pearson correlation which

absolute_angular_distance(→ float)

Returns absolute angular distance between two vectors. It is a modification of angular distance where the absolute

squared_angular_distance(→ float)

Returns squared angular distance between two vectors. It is a modification of angular distance where the square of

distance_correlation(→ float)

Returns distance correlation between two vectors. Distance correlation captures both linear and non-linear

kullback_leibler_distance(corr_a, corr_b)

Returns the Kullback-Leibler distance between two correlation matrices, all elements must be positive.

norm_distance(matrix_a, matrix_b[, r_val])

Returns the normalized distance between two matrices.

angular_distance(x: numpy.array, y: numpy.array) float

Returns angular distance between two vectors. Angular distance is a slight modification of Pearson correlation which satisfies metric conditions.

Formula used for calculation:

Ang_Distance = (1/2 * (1 - Corr))^(1/2)

Read Cornell lecture notes for more information about angular distance: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes.

Parameters:
  • x – (np.array/pd.Series) X vector.

  • y – (np.array/pd.Series) Y vector.

Returns:

(float) Angular distance.

absolute_angular_distance(x: numpy.array, y: numpy.array) float

Returns absolute angular distance between two vectors. It is a modification of angular distance where the absolute value of the Pearson correlation coefficient is used.

Formula used for calculation:

Abs_Ang_Distance = (1/2 * (1 - abs(Corr)))^(1/2)

Read Cornell lecture notes for more information about absolute angular distance: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes.

Parameters:
  • x – (np.array/pd.Series) X vector.

  • y – (np.array/pd.Series) Y vector.

Returns:

(float) Absolute angular distance.

squared_angular_distance(x: numpy.array, y: numpy.array) float

Returns squared angular distance between two vectors. It is a modification of angular distance where the square of Pearson correlation coefficient is used.

Formula used for calculation:

Squared_Ang_Distance = (1/2 * (1 - (Corr)^2))^(1/2)

Read Cornell lecture notes for more information about squared angular distance: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes.

Parameters:
  • x – (np.array/pd.Series) X vector.

  • y – (np.array/pd.Series) Y vector.

Returns:

(float) Squared angular distance.

distance_correlation(x: numpy.array, y: numpy.array) float

Returns distance correlation between two vectors. Distance correlation captures both linear and non-linear dependencies.

Formula used for calculation:

Distance_Corr[X, Y] = dCov[X, Y] / (dCov[X, X] * dCov[Y, Y])^(1/2)

dCov[X, Y] is the average Hadamard product of the doubly-centered Euclidean distance matrices of X, Y.

Read Cornell lecture notes for more information about distance correlation: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3512994&download=yes.

Parameters:
  • x – (np.array/pd.Series) X vector.

  • y – (np.array/pd.Series) Y vector.

Returns:

(float) Distance correlation coefficient.

kullback_leibler_distance(corr_a, corr_b)

Returns the Kullback-Leibler distance between two correlation matrices, all elements must be positive.

Formula used for calculation:

kullback_leibler_distance[X, Y] = 0.5 * ( Log( det(Y) / det(X) ) + tr((Y ^ -1).X - n )

Where n is the dimension space spanned by X.

Read Don H. Johnson’s research paper for more information on Kullback-Leibler distance: https://scholarship.rice.edu/bitstream/handle/1911/19969/Joh2001Mar1Symmetrizi.PDF

Parameters:
  • corr_a – (np.array/pd.Series/pd.DataFrame) Numpy array of the first correlation matrix.

  • corr_b – (np.array/pd.Series/pd.DataFrame) Numpy array of the second correlation matrix.

Returns:

(np.float64) the Kullback-Leibler distance between the two matrices.

norm_distance(matrix_a, matrix_b, r_val=2)

Returns the normalized distance between two matrices.

This function is a wrap for numpy’s linear algebra method (numpy.linalg.norm). Link to documentation: https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html.

Formula used to normalize matrix:

norm_distance[X, Y] = sum( abs(X - Y) ^ r ) ^ 1/r

Where r is a parameter. r=1 City block(L1 norm), r=2 Euclidean distance (L2 norm), r=inf Supermum (L_inf norm). For values of r < 1, the result is not really a mathematical ‘norm’.

Parameters:
  • matrix_a – (np.array/pd.Series/pd.DataFrame) Array of the first matrix.

  • matrix_b – (np.array/pd.Series/pd.DataFrame) Array of the second matrix.

  • r_val – (int/str) The r value of the normalization formula. (2 by default, Any Integer)

Returns:

(float) The Euclidean distance between the two matrices.