mlfinlab.feature_importance.orthogonal
Module which implements feature PCA compression and PCA analysis of feature importance.
Module Contents
Functions
|
Advances in Financial Machine Learning, Snippet 8.5, page 119. |
|
Advances in Financial Machine Learning, Snippet 8.6, page 121. |
|
Perform correlation analysis between feature importance (MDI for example, supervised) and PCA eigenvalues |
- get_orthogonal_features(feature_df, variance_thresh=0.95, num_features=None)
-
Advances in Financial Machine Learning, Snippet 8.5, page 119.
Computation of Orthogonal Features.
Gets PCA orthogonal features.
- Parameters:
-
-
feature_df – (pd.DataFrame): Dataframe of features.
-
variance_thresh – (float): Percentage % of overall variance which compressed vectors should explain. Default is 0.95.
-
num_features – (int) Manually set number of features, overrides variance_thresh. Default is None.
-
- Returns:
-
(np.array): Compressed PCA features which explain %variance_thresh of variance.
- get_pca_rank_weighted_kendall_tau(feature_imp, pca_rank)
-
Advances in Financial Machine Learning, Snippet 8.6, page 121.
Computes Weighted Kendall’s Tau Between Feature Importance and Inverse PCA Ranking.
- Parameters:
-
-
feature_imp – (np.array): Feature mean importance.
-
pca_rank – (np.array): PCA based feature importance rank.
-
- Returns:
-
(float): Weighted Kendall Tau of feature importance and inverse PCA rank with p_value.
- feature_pca_analysis(feature_df, feature_importance, variance_thresh=0.95)
-
Perform correlation analysis between feature importance (MDI for example, supervised) and PCA eigenvalues (unsupervised).
High correlation means that probably the pattern identified by the ML algorithm is not entirely overfit.
- Parameters:
-
-
feature_df – (pd.DataFrame): Features dataframe.
-
feature_importance – (pd.DataFrame): Individual MDI feature importance.
-
variance_thresh – (float): Percentage % of overall variance which compressed vectors should explain in PCA compression.
-
- Returns:
-
(dict): Dictionary with kendall, spearman, pearson and weighted_kendall correlations and p_values.