mlfinlab.clustering.feature_clusters
This module creates clustered subsets of features described in the paper Clustered Feature Importance (Presentation Slides) by Dr. Marcos Lopez de Prado. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3517595 and is also explained in the book Machine Learning for Asset Managers Snippet 6.5.2 page 84.
Module Contents
Functions
|
Machine Learning for Asset Managers |
- get_feature_clusters(X: pandas.DataFrame, dependence_metric: str, distance_metric: str = None, linkage_method: str = None, n_clusters: int = None, check_silhouette_scores: bool = True, critical_threshold: float = 0.0) list
-
Machine Learning for Asset Managers Snippet 6.5.2.1 , page 85. Step 1: Features Clustering
Gets clustered features subsets from the given set of features.
- Parameters:
-
-
X – (pd.DataFrame) Dataframe of features.
-
dependence_metric – (str) Method to be use for generating dependence_matrix, either ‘linear’ or ‘information_variation’ or ‘mutual_information’ or ‘distance_correlation’.
-
distance_metric – (str) The distance operator to be used for generating the distance matrix. The methods that can be applied are: ‘angular’, ‘squared_angular’, ‘absolute_angular’. Set it to None if the feature are to be generated as it is by the ONC algorithm.
-
linkage_method – (str) Method of linkage to be used for clustering. Methods include: ‘single’, ‘ward’, ‘complete’, ‘average’, ‘weighted’, and ‘centroid’. Set it to None if the feature are to be generated as it is by the ONC algorithm.
-
n_clusters – (int) Number of clusters to form. Must be less the total number of features. If None then it returns optimal number of clusters decided by the ONC Algorithm.
-
check_silhouette_scores – (bool) Flag to check if X contains features with low silh. scores and modify it.
-
critical_threshold – (float) Threshold for determining low silhouette score in the dataset. It can any real number in [-1,+1], default is 0 which means any feature that has a silhouette score below 0 will be indentified as having low silhouette and hence required transformation will be appiled to for for correction of the same.
-
- Returns:
-
(list) Feature subsets.