Model Fingerprint Algorithm


Another way to get a better understanding of a machine learning model is to understand how feature values influence model predictions. Feature effects can be decomposed into 3 components (fingerprints):

  • Linear component

  • Non-linear component

  • Pairwise interaction component

Yimou Li, David Turkington, and Alireza Yazdani published a paper in the Journal of Financial Data Science ‘Beyond the Black Box: An Intuitive Approach to Investment Prediction with Machine Learning’ which describes in details the algorithm of extracting linear, non-linear and pairwise feature effects. This module implements the algorithm described in the article.

Tip

  • I would like to highlight that this algorithm is one of the tools that our team uses the most! There are 2 classes which inherit from an abstract base class, you only need to instantiate the child classes.

  • This algorithm is also a favourite of multiple award winning hedge funds!


Implementation

class AbstractModelFingerprint

Model fingerprint constructor.

This is an abstract base class for the RegressionModelFingerprint and ClassificationModelFingerprint classes.

fit(model: object, X: DataFrame, num_values: int = 50, pairwise_combinations: list | None = None) None

Get linear, non-linear and pairwise effects estimation.

Parameters:
  • model – (object) Trained model.

  • X – (pd.DataFrame) Dataframe of features.

  • num_values – (int) Number of values used to estimate feature effect.

  • pairwise_combinations – (list) Tuples (feature_i, feature_j) to test pairwise effect.

get_effects() Tuple

Return computed linear, non-linear and pairwise effects. The model should be fit() before using this method.

Returns:

(tuple) Linear, non-linear and pairwise effects, of type dictionary (raw values and normalised).

plot_effects(sort_by: str = 'lin', normalized=False) figure

Plot each effect (normalized) on a bar plot (linear, non-linear). Also plots pairwise effects if calculated. The results are sorted by linear effect values by default.

Parameters:
  • sort_by – (str) Choose the effect (‘lin’ or ‘non-lin’) that the results will be sorted by.

  • normalized – (bool) Choose whether the plot results should be normalized or not. Values are normalized across all variables.

Returns:

(plt.figure) Plot figure.

class ClassificationModelFingerprint

Classification Fingerprint class used for classification type of models.

class RegressionModelFingerprint

Regression Fingerprint class used for regression type of models.


Example

>>> # Import packages
>>> import pandas as pd
>>> from sklearn.datasets import fetch_california_housing
>>> from sklearn.ensemble import RandomForestRegressor
>>> # Import MlFinlab tools
>>> from mlfinlab.feature_importance.fingerprint import RegressionModelFingerprint
>>> # Get a dataset
>>> data = fetch_california_housing()
>>> X = pd.DataFrame(columns=data["feature_names"], data=data["data"])
>>> y = pd.Series(data["target"])
>>> # Fit the model
>>> reg = RandomForestRegressor(n_estimators=3, random_state=42)
>>> reg = reg.fit(X, y)
>>> # Create the fingerprint model
>>> reg_fingerprint = RegressionModelFingerprint()
>>> # Fit the fingerprint model
>>> _ = reg_fingerprint.fit(
...     reg,
...     X,
...     num_values=20,
...     pairwise_combinations=[
...         ("MedInc", "AveRooms"),
...         ("HouseAge", "AveBedrms"),
...         ("Population", "Latitude"),
...     ],
... )
>>> # Get linear non-linear effects and pairwise effects
>>> linear_effect, non_linear_effect, pair_wise_effect = reg_fingerprint.get_effects()
>>> # Plot the results
>>> fig = reg_fingerprint.plot_effects(sort_by="non_lin")
>>> fig.show()
finger.png

Research Article



References

jfds.jpg