Model Fingerprint Algorithm
Another way to get a better understanding of a machine learning model is to understand how feature values influence model predictions. Feature effects can be decomposed into 3 components (fingerprints):
-
Linear component
-
Non-linear component
-
Pairwise interaction component
Yimou Li, David Turkington, and Alireza Yazdani published a paper in the Journal of Financial Data Science ‘Beyond the Black Box: An Intuitive Approach to Investment Prediction with Machine Learning’ which describes in details the algorithm of extracting linear, non-linear and pairwise feature effects. This module implements the algorithm described in the article.
Tip
-
I would like to highlight that this algorithm is one of the tools that our team uses the most! There are 2 classes which inherit from an abstract base class, you only need to instantiate the child classes.
-
This algorithm is also a favourite of multiple award winning hedge funds!
Implementation
- class AbstractModelFingerprint
-
Model fingerprint constructor.
This is an abstract base class for the RegressionModelFingerprint and ClassificationModelFingerprint classes.
- fit(model: object, X: DataFrame, num_values: int = 50, pairwise_combinations: list | None = None) None
-
Get linear, non-linear and pairwise effects estimation.
- Parameters:
-
-
model – (object) Trained model.
-
X – (pd.DataFrame) Dataframe of features.
-
num_values – (int) Number of values used to estimate feature effect.
-
pairwise_combinations – (list) Tuples (feature_i, feature_j) to test pairwise effect.
-
- get_effects() Tuple
-
Return computed linear, non-linear and pairwise effects. The model should be fit() before using this method.
- Returns:
-
(tuple) Linear, non-linear and pairwise effects, of type dictionary (raw values and normalised).
- plot_effects(sort_by: str = 'lin', normalized=False) figure
-
Plot each effect (normalized) on a bar plot (linear, non-linear). Also plots pairwise effects if calculated. The results are sorted by linear effect values by default.
- Parameters:
-
-
sort_by – (str) Choose the effect (‘lin’ or ‘non-lin’) that the results will be sorted by.
-
normalized – (bool) Choose whether the plot results should be normalized or not. Values are normalized across all variables.
-
- Returns:
-
(plt.figure) Plot figure.
- class ClassificationModelFingerprint
-
Classification Fingerprint class used for classification type of models.
- class RegressionModelFingerprint
-
Regression Fingerprint class used for regression type of models.
Example
>>> # Import packages
>>> import pandas as pd
>>> from sklearn.datasets import fetch_california_housing
>>> from sklearn.ensemble import RandomForestRegressor
>>> # Import MlFinlab tools
>>> from mlfinlab.feature_importance.fingerprint import RegressionModelFingerprint
>>> # Get a dataset
>>> data = fetch_california_housing()
>>> X = pd.DataFrame(columns=data["feature_names"], data=data["data"])
>>> y = pd.Series(data["target"])
>>> # Fit the model
>>> reg = RandomForestRegressor(n_estimators=3, random_state=42)
>>> reg = reg.fit(X, y)
>>> # Create the fingerprint model
>>> reg_fingerprint = RegressionModelFingerprint()
>>> # Fit the fingerprint model
>>> _ = reg_fingerprint.fit(
... reg,
... X,
... num_values=20,
... pairwise_combinations=[
... ("MedInc", "AveRooms"),
... ("HouseAge", "AveBedrms"),
... ("Population", "Latitude"),
... ],
... )
>>> # Get linear non-linear effects and pairwise effects
>>> linear_effect, non_linear_effect, pair_wise_effect = reg_fingerprint.get_effects()
>>> # Plot the results
>>> fig = reg_fingerprint.plot_effects(sort_by="non_lin")
>>> fig.show()