
This module creates clustered subsets of features described in the paper Clustered Feature Importance (Presentation Slides) by Dr. Marcos Lopez de Prado. and is also explained in the book Machine Learning for Asset Managers Snippet 6.5.2 page 84.

Module Contents


get_feature_clusters(→ list)

Machine Learning for Asset Managers

get_feature_clusters(X: pandas.DataFrame, dependence_metric: str, distance_metric: str = None, linkage_method: str = None, n_clusters: int = None, check_silhouette_scores: bool = True, critical_threshold: float = 0.0) list

Machine Learning for Asset Managers Snippet , page 85. Step 1: Features Clustering

Gets clustered features subsets from the given set of features.

  • X – (pd.DataFrame) Dataframe of features.

  • dependence_metric – (str) Method to be use for generating dependence_matrix, either ‘linear’ or ‘information_variation’ or ‘mutual_information’ or ‘distance_correlation’.

  • distance_metric – (str) The distance operator to be used for generating the distance matrix. The methods that can be applied are: ‘angular’, ‘squared_angular’, ‘absolute_angular’. Set it to None if the feature are to be generated as it is by the ONC algorithm.

  • linkage_method – (str) Method of linkage to be used for clustering. Methods include: ‘single’, ‘ward’, ‘complete’, ‘average’, ‘weighted’, and ‘centroid’. Set it to None if the feature are to be generated as it is by the ONC algorithm.

  • n_clusters – (int) Number of clusters to form. Must be less the total number of features. If None then it returns optimal number of clusters decided by the ONC Algorithm.

  • check_silhouette_scores – (bool) Flag to check if X contains features with low silh. scores and modify it.

  • critical_threshold – (float) Threshold for determining low silhouette score in the dataset. It can any real number in [-1,+1], default is 0 which means any feature that has a silhouette score below 0 will be indentified as having low silhouette and hence required transformation will be appiled to for for correction of the same.


(list) Feature subsets.