mlfinlab.feature_importance.orthogonal

Module which implements feature PCA compression and PCA analysis of feature importance.

Module Contents

Functions

get_orthogonal_features(feature_df[, variance_thresh, ...])

Advances in Financial Machine Learning, Snippet 8.5, page 119.

get_pca_rank_weighted_kendall_tau(feature_imp, pca_rank)

Advances in Financial Machine Learning, Snippet 8.6, page 121.

feature_pca_analysis(feature_df, feature_importance[, ...])

Perform correlation analysis between feature importance (MDI for example, supervised) and PCA eigenvalues

get_orthogonal_features(feature_df, variance_thresh=0.95, num_features=None)

Advances in Financial Machine Learning, Snippet 8.5, page 119.

Computation of Orthogonal Features.

Gets PCA orthogonal features.

Parameters:
  • feature_df – (pd.DataFrame): Dataframe of features.

  • variance_thresh – (float): Percentage % of overall variance which compressed vectors should explain. Default is 0.95.

  • num_features – (int) Manually set number of features, overrides variance_thresh. Default is None.

Returns:

(np.array): Compressed PCA features which explain %variance_thresh of variance.

get_pca_rank_weighted_kendall_tau(feature_imp, pca_rank)

Advances in Financial Machine Learning, Snippet 8.6, page 121.

Computes Weighted Kendall’s Tau Between Feature Importance and Inverse PCA Ranking.

Parameters:
  • feature_imp – (np.array): Feature mean importance.

  • pca_rank – (np.array): PCA based feature importance rank.

Returns:

(float): Weighted Kendall Tau of feature importance and inverse PCA rank with p_value.

feature_pca_analysis(feature_df, feature_importance, variance_thresh=0.95)

Perform correlation analysis between feature importance (MDI for example, supervised) and PCA eigenvalues (unsupervised).

High correlation means that probably the pattern identified by the ML algorithm is not entirely overfit.

Parameters:
  • feature_df – (pd.DataFrame): Features dataframe.

  • feature_importance – (pd.DataFrame): Individual MDI feature importance.

  • variance_thresh – (float): Percentage % of overall variance which compressed vectors should explain in PCA compression.

Returns:

(dict): Dictionary with kendall, spearman, pearson and weighted_kendall correlations and p_values.