Meta-Labeling

The concept of meta-labeling was first introduced by Lopez De Prado (2018) in his book Advances in Financial Machine Learning. Meta-labeling is defined as a secondary ML model that learns how to use a primary model.

Fitting an ML layer that sits on top of a base primary strategy can be used to size positions, filter out false-positive signals, and improve metrics such as the Sharpe ratio and maximum drawdown (Joubert 2022). The secondary model’s target variables are the meta labels, which are defined as binary labels with the values “0” or “1” and indicate whether the primary model’s forecast was correct (1) or incorrect (0). The secondary model’s output can then be interpreted as the probability of a profitable trade. The final position size can be determined from this output probability, where the higher the probability, the larger the position size. Meta-labeling only filters out potential false positives and does not produce any new trading signals. Therefore, recall (the number of all profitabletrades) is sacrificed for a higher degree of precision (proportion correctly identified profitable trades), leading to a higher F1 score and improving the model’s effectiveness (Joubert 2022).

Advances in Financial Machine Learning, Chapter 3, page 50. reads:

“Suppose that you have a model for setting the side of the bet (long or short). You just need to learn the size of that bet, which includes the possibility of no bet at all (zero size). This is a situation that practitioners face regularly. We often know whether we want to buy or sell a product, and the only remaining question is how much money we should risk in such a bet. We do not want the ML algorithm to learn the side, just to tell us what is the appropriate size. At this point, it probably does not surprise you to hear that no book or paper has so far discussed this common problem. Thankfully, that misery ends here.”

Why Meta-Labeling is useful

First, ML algorithms are often criticized as black boxes. Meta-labeling allows you to build an ML system on top of a glass box (like a fundamental model founded on economic theory). This ability to transform a fundamental model into an ML model should make meta-labeling particularly useful to “quantamental” firms.

Second, the effects of over-fitting are limited when you apply metalabeling, because it will not decide the side of your bet, only the size.

Third, by decoupling the side prediction from the size prediction, meta-labeling enables sophisticated strategy structures. For instance, consider that the features driving a rally may differ from the features driving a sell-off. In that case, you may want to develop an ML strategy exclusively for long positions, based on the buy recommendations of a primary model, and an ML strategy exclusively for short positions, based on the sell recommendations of an entirely different primary model.

Fourth, achieving high accuracy on small bets and low accuracy on large bets will ruin you. As important as identifying good opportunities is to size them properly, so it makes sense to develop an ML algorithm solely focused on getting that critical decision (sizing) right.

How to use Meta-Labeling


Before applying meta-labeling to a strategy, the first step is to understand the strategy’s risks (Lopez de Prado 2018, Chapter 15) and to determine what the impact on the Sharpe ratio will be when increasing the precision at the cost of reducing the number of trades. The Sharpe ratio is a function of precision rather than accuracy, as true positives are rewarded, false positives are punished, and negatives are not rewarded, as a position was never taken. Each strategy will have its own risk profile, and not all strategies will benefit from this tradeoff; however, strategies with a high number of trades will generally be more sensitive to a change in precision.

The number of trades plays an important role in the success of meta-labeling for two reasons. First, the secondary model needs enough observations to be trained on and to find a good fit. Second, a high number of trades increases the probability that a small change in precision will have a larger impact on the Sharpe ratio. Lopez de Prado (2019) notes that when 𝜋+ ≫ 𝜋− , the strategy may also be a good candidate for meta-labeling.

In the realm of meta-labeling, binary classification introduces a critical trade-off between false positives and false negatives. This trade-off becomes significant as meta-labeling involves leveraging the predictions of a primary classifier to generate labels for a secondary classifier. It is essential to comprehend the impact of improving the true positive rate, which often results in an increase in false positives. Analyzing the receiver operating characteristic (ROC) curve, which illustrates the trade-off at different thresholds, and utilizing the confusion matrix, play vital roles in developing effective meta-labeling strategies.

confusion_matrix.png

The image illustrates the so-called “confusion matrix.” On a set of observations, there are items that exhibit a condition (positives, left rectangle), and items that do not exhibit a condition (negative, right rectangle). A binary classifier predicts that some items exhibit the condition (ellipse), where the TP area contains the true positives and the TN area contains the true negatives. This leads to two kinds of errors: false positives (FP) and false negatives (FN). “Precision” is the ratio between the TP area and the area in the ellipse. “Recall” is the ratio between the TP area and the area in the left rectangle. This notion of recall (aka true positive rate) is in the context of classification problems, the analogous to “power” in the context of hypothesis testing. “Accuracy” is the sum of the TP and TN areas divided by the overall set of items (square). In general, decreasing the FP area comes at a cost of increasing the FN area, because higher precision typically means fewer calls, hence lower recall. Still, there is some combination of precision and recall that maximizes the overall efficiency of the classifier. The F1-score measures the efficiency of a classifier as the harmonic average between precision and recall.

Model Architecture


There are two main approaches to the development of meta-labeling architectures; feature-driven and strategy-driven architectures.

Feature-driven architectures emphasize the way the information is extracted and exploited from the data using different meta-labeling schemes or arrangements. The primary model is viewed as a black box, and the emphasis is on the input data (information) provided in each layer of the architecture.

Strategy-driven architectures, on the other hand, specifically take into consideration the properties of the underlying primary strategy and aims to apply meta-labeling in a way that best captures and enhances unique characteristics of the strategy. It should be noted that there is considerable overlap between the two approaches, and the approaches simply indicate the thought process underlying the development of the architecture.

Each architecture aims to account for specific aspects of the MLA process. Further architectures could be developed by combining the different architectures, resulting in more advanced strategies. For example, a feature-driven approach could be combined with a strategy-driven approach to capture all necessary aspects.

primary.png

Exhibit 1 presents a simple architecture of an ML primary model. The primary model can be any strategy such as econometric equations, technical trading rules, fundamental analyses, factor-based strategies, or discretionary trades.

The primary model takes as its inputs features, which are indicative of a directional move, and aims to predict the side of the trade. The predicted side is given as the labels {−1, 0, 1}, signalling the position to take as follows: -1 for a short position, 0 to close any open position, and 1 for a long position and is denoted by The initial output from the ML primary model, , is between [-1,1] and the use of a threshold value to adjust precision and recall can be implemented, this as a further step (recall check). Values above are denoted by 1, those below by -1, and all other values by 0.

This primary model’s threshold should be adjusted to obtain a good performing model. This means we should take a look at all the classification metrics together, or strategy metrics such as the Sharpe ratio and maximum drawdown for example. The strategy for the primary model is developed independently from the choice of secondary model. The threshold is therefore determined by getting the best classification metrics (F1 score, accuracy etc) if you only want to get the side of the trade, or one can look at strategy metrics such as the Sharpe ratio and maximum drawdown. It should not be adjusted to have a high recall. Meta-labeling may be a good candidate if the model has a high recall. In the coded example you will find that that 0.20 is used as a threshold. In your applications, we suggest using ROC curves to determine the desired precision / recall trade-off for your application.

The secondary model’s aim is to filter out false positives. The primary only determines the direction of the trade (long,short, or no trade). The primary model also provides model evaluation statistics, which are used in the secondary model as a feature indicative of false positives. This will allow the secondary model to pick up when the primary model is underperforming.

meta_labeling_arch.png

Exhibit 2 presents a general meta-labeling architecture. The target variable is the meta-labels, which indicate whether the directional prediction of the primary model is correct. The features that the secondary model considers can be divided into four distinct components. First, if there is an information advantage, the original data set that was used in the primary model should be included in the secondary model. Second, to measure recent primary model performance, evaluation statistics should be included. Third, market state statistics and regime-related features could be included, as the primary model may not exhibit adequate performance in all market conditions. Lastly, if the primary model is an ML algorithm, the raw output from the primary model can be included to signify model confidence. The last three components contribute to the “modelling for false positives” advantage of applying meta-labeling.

The secondary model could potentially exploit the information used in the primary model, for example if the primary model is linear and the secondary model is non-linear. The original features used in the primary model should be passed on to the secondary model. The features from the first model are concatenated with the predictions from the first model, into a new feature set for the secondary model. The target variable in the secondary model are meta-labels, which are defined as a binary label {0, 1} that indicates whether the primary model’s forecast was profitable or not. Thus, it makes a meta prediction. Now fit the second model. The output of this model is the probability of a true positive [0, 1], and is used to size positions where the greater the estimated probability of a profitable trade, the larger the position size. It represents a tradeoff, in which recall is traded for precision, leading to a better model efficiency, represented by a higher F1-score (the harmonic mean between precision and recall). Lastly, the prediction from the secondary model is combined with the prediction from the primary model and only where both are true, is your final prediction true.

Ensemble Architecture



Implementation

The following functions are used for the triple-barrier method which works in tandem with meta-labeling.

Note

For this section we retained the books original function names, so that the user could have a smoother journey.

add_vertical_barrier(t_events, close, num_days=0, num_hours=0, num_minutes=0, num_seconds=0, num_bars=None)

Add a vertical barrier.

Add a vertical barrier, which represents a timestamp of the next price bar at or immediately after a number of given time units such as num_days, num_hours etc., for each index in t_events.

Alternatively, a num_bars parameter can be passed to add a vertical barrier after a given number of bars. In such a case, the inputs num_days, num_hours etc., will be ignored. This vertical barrier can be passed as an optional argument t1 in get_events. This function creates a series that has all the timestamps of when the vertical barrier would be reached.

References Advances in Financial Machine Learning, Snippet 3.4 page 49.

Parameters:
  • t_events – (pd.Series) Series of events (symmetric CUSUM filter).

  • close – (pd.Series) Close prices.

  • num_days – (int) Number of days to add for vertical barrier.

  • num_hours – (int) Number of hours to add for vertical barrier.

  • num_minutes – (int) Number of minutes to add for vertical barrier.

  • num_seconds – (int) Number of seconds to add for vertical barrier.

  • num_bars – (int) Number of bars (samples) after which to construct vertical barriers (None by default).

Returns:

(pd.Series) Timestamps of vertical barriers.

References:

  • Advances in Financial Machine Learning, Snippet 3.4 page 49.

get_events(close, t_events, pt_sl, target, num_threads, vertical_barrier_times=False, side_prediction=None, verbose=True, **kwargs)

Generate a DataFrame of events based on the triple barrier method.

The DataFrame contains the upper and lower boundaries based on the target series multiplied by the profit taking and stop loss levels specified. Optionally, a vertical barrier could also be added. The DataFrame contains ‘t1’, which is the time when the first barriers were reached.

Parameters:
  • close – (pd.Series) Close prices.

  • t_events – (pd.Series) of t_events. These are timestamps that will seed every triple barrier. These are the timestamps selected by the sampling procedures discussed in Chapter 2, Section 2.5. Eg: CUSUM Filter.

  • pt_sl – (2 element array) Element 0, indicates the profit taking level; Element 1 is stop loss level. A non-negative float that sets the width of the two barriers. A 0 value means that the respective horizontal barrier (profit taking and/or stop loss) will be disabled.

  • target – (pd.Series) of absolute values that are used (in conjunction with pt_sl) to determine the width of the barrier. One can for example use a volatility series to determine the width.

  • num_threads – (int) The number of threads concurrently used by the function.

  • vertical_barrier_times – (pd.Series) A pandas series with the timestamps of the vertical barriers. We pass a False when we want to disable vertical barriers.

  • side_prediction – (pd.Series) Side of the bet (long/short) as decided by the primary model

  • verbose – (bool) Flag to report progress on asynch jobs.

Returns:

(pd.DataFrame) DataFrame consisting of the columns t1, trgt, side, pt, and sl.

  • Index - Event’s starttime.

  • t1 - Time when the first barriers were reached.

  • trgt - The event’s target.

  • side - (optional) Implies the algo’s position side.

  • pt - Profit taking multiple.

  • sl - Stop loss multiple.

References:

  • Advances in Financial Machine Learning, Snippet 3.6 page 50.

get_bins(triple_barrier_events, close)

Provides the labels based on the triple-barrier method.

The top horizontal barrier, bottom horizontal barrier and vertical barrier is labeled respectively as {1,-1,0}. Meta-labeling can also be used where we give the ‘side’ of the trade. The possible values in this case is then {0,1}. The ML algorithm will be trained to decide is 1, we can use the probability of this secondary prediction to derive the size of the bet, where the side (sign) of the position has been set by the primary model.

Parameters:
  • triple_barrier_events

    (pd.DataFrame) DataFrame consisting of the columns t1, trgt, side.

    • Index - Event’s starttime.

    • t1 - Time when the first barriers were reached.

    • trgt - The event’s target.

    • side - (optional) Implies the algo’s position side.

    Case 1: (‘side’ not in events): bin in (-1,0,1) <-label by price action. Case 2: (‘side’ in events): bin in (0,1) <-label by pnl (meta-labeling).

  • close – (pd.Series) Close prices.

Returns:

(pd.DataFrame) Meta-labeled events.

References:

  • Advances in Financial Machine Learning, Snippet 3.7, page 51.

drop_labels(events, min_pct=0.05)

This function recursively eliminates rare observations.

Parameters:
  • events – (dp.DataFrame) Events.

  • min_pct – (float) A fraction used to decide if the observation occurs less than that fraction.

Returns:

(pd.DataFrame) Events.

References:
  • Advances in Financial Machine Learning, Snippet 3.8 page 54.


Example

Suppose we use a mean-reverting strategy as our primary model, giving each observation a label of -1 or 1. We can then use meta-labeling to act as a filter for the bets of our primary model.

Assuming we have a pandas series with the timestamps of our observations and their respective labels given by the primary model, the process to generate meta-labels goes as follows.

# Import packages
import numpy as np
import pandas as pd

# Import MlFinLab tools
import mlfinlab as ml

# Read in data
data = pd.read_csv('FILE_PATH')

# Compute daily volatility
daily_vol = ml.util.get_daily_vol(close=data['close'], lookback=50)

# Apply Symmetric CUSUM Filter and get timestamps for events
# Note: Only the CUSUM filter needs a point estimate for volatility
cusum_events = ml.filters.cusum_filter(data['close'],
                                       threshold=daily_vol['2011-09-01':'2018-01-01'].mean())

# Compute vertical barrier using timedelta
vertical_barriers = ml.labeling.add_vertical_barrier(t_events=cusum_events,
                                                     close=data['close'],
                                                     num_days=1)

# Another option is to compute the vertical bars after a fixed number of samples
vertical_barriers = ml.labeling.add_vertical_barrier(t_events=cusum_events,
                                                     close=data['close'],
                                                     num_bars=20)

Once we have computed the daily volatility along with our vertical time barriers and have downsampled our series using the CUSUM filter, we can use the triple-barrier method to compute our meta-labels by passing in the side predicted by the primary model.

pt_sl = [1, 2]
min_ret = 0.005
triple_barrier_events = ml.labeling.get_events(close=data['close'],
                                               t_events=cusum_events,
                                               pt_sl=pt_sl,
                                               target=daily_vol,
                                               min_ret=min_ret,
                                               num_threads=3,
                                               vertical_barrier_times=vertical_barriers,
                                               side_prediction=data['side'])

As can be seen above, we have scaled our lower barrier and set our minimum return to 0.005.

Warning

The biggest mistake we see users making here is that they change the daily targets and min_ret values to get more observations, since ML models require a fair amount of data. This is the wrong approach!

Please visit the Seven-Point Protocol under the Backtest Overfitting Tools section to learn more about how to think about features and outcomes.

Meta-labels can then be computed using the time that each observation touched its respective barrier.

meta_labels = ml.labeling.get_bins(triple_barrier_events, data['close'])

This example ends with creating the meta-labels. To see a further explanation of using these labels in a secondary model to help filter out false positives, see the research notebooks below.


Research Papers

Meta-Labeling: Theory and Framework

Meta-labeling is a machine learning (ML) layer that sits on top of a base primary strategy to help size positions, filter out false-positive signals, and improve metrics such as the Sharpe ratio and maximum drawdown. This article consolidates the knowledge of several publications into a single work, providing practitioners with a clear framework to support the application of meta-labeling to investment strategies. The relationships between binary classification metrics and strategy performance are explained, alongside answers to many frequently asked questions regarding the technique. The author also deconstructs meta-labeling into three components, using a controlled experiment to show how each component helps to improve strategy metrics and what types of features should be considered in the model specification phase.

Meta-Labeling Architecture

Separating the side and size of a position allows for sophisticated strategy structures to be developed. Modeling the size component can be done through a meta-labeling approach. This article establishes several heterogeneous architectures to account for key aspects of meta-labeling. They serve as a guide for practitioners in the model development process, as well as for researchers to further build on these ideas. An architecture can be developed through the lens of feature- and/or strategy-driven approaches. The feature-driven approach exploits the way the information in the data is structured and how the selected models use that information, whereas a strategy-driven approach specifically aims to incorporate unique characteristics of the underlying trading strategy. Furthermore, the concept of inverse meta-labeling is introduced as a technique to improve the quantity and quality of the side forecasts.

Ensemble Meta-Labeling

This study systematically investigates different ensemble methods for meta-labeling in finance and presents a framework to facilitate the selection of ensemble learning models for this purpose. Experiments were conducted on the components of information advantage and modeling for false positives to discover whether ensembles were better at extracting and detecting regimes and whether they increased model efficiency. The authors demonstrate that ensembles are especially beneficial when the underlying data consist of multiple regimes and are nonlinear in nature. The authors’ framework serves as a starting point for further research. They suggest that the use of different fusion strategies may foster model selection. Finally, the authors elaborate on how additional applications, such as position sizing, may benefit from their framework.


Blog Posts

Does Meta Labeling Add to Signal Efficacy?

Successful and long-lasting quantitative research programs require a solid foundation that includes procurement and curation of data, creation of building blocks for feature engineering, state of the art methodologies, and backtesting. In this project we explore an example of applying meta labeling to high quality S&P500 EMini Futures data and create a python package (MlFinLab) that is based on the work of Dr. Marcos Lopez de Prado in his book ‘Advances in Financial Machine Learning’. Dr. de Prado’s book provides a guideline for creating a successful platform. We also implement a Trend Following and Mean-reverting Bollinger band based trading strategies. Our results confirm the fact that a combination of event-based sampling, triple-barrier method and meta labeling improves the performance of the strategies.

Meta Labeling (A Toy Example)

This blog post investigates the idea of Meta Labeling and tries to help build an intuition for what is taking place. The idea of meta-labeling is first mentioned in the textbook Advances in Financial Machine Learning by Marcos Lopez de Prado and promises to improve model and strategy performance metrics by helping to filter-out false positives.

We make use of a computer vision problem known as the MNIST handwritten digit classification. By using of a non-financial timeseries data set we can illustrate the components that make up meta labeling more clearly. Lets begin!

Github Code From Papers


Research Notebook

The following research notebooks can be used to better understand the triple-barrier method and meta-labeling

Triple-Barrier Method

  • Chapter 3 Labeling

Meta-Labeling Toy Example

  • Meta Labeling MNIST


Research Article



Presentation Slides

side_size.jpg lecture_32.png

Note

  • pg 12-16: Labeling Techniques

  • pg 17-20: Meta-Labeling



References