Model Fingerprint Algorithm

Another way to get a better understanding of a machine learning model is to understand how feature values influence model predictions. Feature effects can be decomposed into 3 components (fingerprints):

Linear component
Non-linear component
Pairwise interaction component

Yimou Li, David Turkington, and Alireza Yazdani published a paper in the Journal of Financial Data Science ‘Beyond the Black Box: An Intuitive Approach to Investment Prediction with Machine Learning’ which describes in details the algorithm of extracting linear, non-linear and pairwise feature effects. This module implements the algorithm described in the article.

Tip

I would like to highlight that this algorithm is one of the tools that our team uses the most! There are 2 classes which inherit from an abstract base class, you only need to instantiate the child classes.
This algorithm is also a favourite of multiple award winning hedge funds!

Implementation

class AbstractModelFingerprint

Model fingerprint constructor.

This is an abstract base class for the RegressionModelFingerprint and ClassificationModelFingerprint classes.

fit(model: object, X: DataFrame, num_values: int = 50, pairwise_combinations: list | None = None) → None

Get linear, non-linear and pairwise effects estimation.

Parameters:

model – (object) Trained model.
X – (pd.DataFrame) Dataframe of features.
num_values – (int) Number of values used to estimate feature effect.
pairwise_combinations – (list) Tuples (feature_i, feature_j) to test pairwise effect.

get_effects() → Tuple

Return computed linear, non-linear and pairwise effects. The model should be fit() before using this method.

Returns:: (tuple) Linear, non-linear and pairwise effects, of type dictionary (raw values and normalised).

plot_effects(sort_by: str = 'lin', normalized=False) → figure

Plot each effect (normalized) on a bar plot (linear, non-linear). Also plots pairwise effects if calculated. The results are sorted by linear effect values by default.

Parameters:

sort_by – (str) Choose the effect (‘lin’ or ‘non-lin’) that the results will be sorted by.
normalized – (bool) Choose whether the plot results should be normalized or not. Values are normalized across all variables.

Returns:

(plt.figure) Plot figure.

class ClassificationModelFingerprint: Classification Fingerprint class used for classification type of models.

class RegressionModelFingerprint: Regression Fingerprint class used for regression type of models.

Example

                        >>> # Import packages
>>> import pandas as pd
>>> from sklearn.datasets import fetch_california_housing
>>> from sklearn.ensemble import RandomForestRegressor
>>> # Import MlFinlab tools
>>> from mlfinlab.feature_importance.fingerprint import RegressionModelFingerprint
>>> # Get a dataset
>>> data = fetch_california_housing()
>>> X = pd.DataFrame(columns=data["feature_names"], data=data["data"])
>>> y = pd.Series(data["target"])
>>> # Fit the model
>>> reg = RandomForestRegressor(n_estimators=3, random_state=42)
>>> reg = reg.fit(X, y)
>>> # Create the fingerprint model
>>> reg_fingerprint = RegressionModelFingerprint()
>>> # Fit the fingerprint model
>>> _ = reg_fingerprint.fit(
...     reg,
...     X,
...     num_values=20,
...     pairwise_combinations=[
...         ("MedInc", "AveRooms"),
...         ("HouseAge", "AveBedrms"),
...         ("Population", "Latitude"),
...     ],
... )
>>> # Get linear non-linear effects and pairwise effects
>>> linear_effect, non_linear_effect, pair_wise_effect = reg_fingerprint.get_effects()
>>> # Plot the results
>>> fig = reg_fingerprint.plot_effects(sort_by="non_lin")
>>> fig.show()

                      

Research Article

References

Li, Y., Turkington, D. and Yazdani, A., 2020. Beyond the black box: an intuitive approach to investment prediction with machine learning. The Journal of Financial Data Science, 2(1), pp.61-75.