Triple-Barrier


The most-common labeling method used in financial academia is the fixed-time horizon method. While ubiquitous, this method has many drawbacks which can be mitigated by using the triple-barrier method. The triple-barrier method can also be extended to incorporate meta-labeling which will also be demonstrated and discussed in this section.

Note

Underlying Literature

The following sources describe this method in more detail:


Triple-Barrier Method


The main concept behind the triple-barrier method is that we have three barriers: an upper barrier, a lower barrier, and a vertical barrier. The upper barrier represents the threshold an observation’s return needs to reach in order to be considered a buying opportunity (and assigned a label of 1), the lower barrier represents the threshold an observation’s return needs to reach in order to be considered a selling opportunity (a label of -1), and the vertical barrier represents the amount of time an observation has to reach it’s given return in either direction before it is assigned a label of 0 (indicating no trade should take place). This concept can be better understood visually and is shown in the figure below taken from Advances in Financial Machine Learning:

triple_barrier.png

One of the major faults with the fixed-time horizon method (in contrast to the triple-barrier method) is that observations are given a label with respect to a certain threshold after a fixed interval regardless of their respective volatilities. In other words, the expected returns of every observation are treated equally regardless of the associated risk. The triple-barrier method tackles this issue by dynamically setting the upper and lower barriers for each observation based on their given volatilities. The barriers are set by multiplying the current estimated volatility by a profit taking and stop loss level. The volatility series however, can be replaced with any series that will then be used to set the dynamic thresholds.

Note

We’ve further improved the model described in Advances in Financial Machine Learning by prof. Marcos Lopez de Prado to speed up the execution time.

Starting from MlFinLab version 1.6.0 the execution is up to 2 times faster compared to the models from version 1.5.0 and earlier. (The speed improvement depends on the size of the input dataset)


Implementation

The following functions are used for the triple-barrier method. These methods can also be used in tandem for the meta-labeling approach.

add_vertical_barrier(t_events, close, num_days=0, num_hours=0, num_minutes=0, num_seconds=0, num_bars=None)

Add a vertical barrier.

Add a vertical barrier, which represents a timestamp of the next price bar at or immediately after a number of given time units such as num_days, num_hours etc., for each index in t_events.

Alternatively, a num_bars parameter can be passed to add a vertical barrier after a given number of bars. In such a case, the inputs num_days, num_hours etc., will be ignored. This vertical barrier can be passed as an optional argument t1 in get_events. This function creates a series that has all the timestamps of when the vertical barrier would be reached.

References Advances in Financial Machine Learning, Snippet 3.4 page 49.

Parameters:
  • t_events – (pd.Series) Series of events (symmetric CUSUM filter).

  • close – (pd.Series) Close prices.

  • num_days – (int) Number of days to add for vertical barrier.

  • num_hours – (int) Number of hours to add for vertical barrier.

  • num_minutes – (int) Number of minutes to add for vertical barrier.

  • num_seconds – (int) Number of seconds to add for vertical barrier.

  • num_bars – (int) Number of bars (samples) after which to construct vertical barriers (None by default).

Returns:

(pd.Series) Timestamps of vertical barriers.

References:

  • Advances in Financial Machine Learning, Snippet 3.4 page 49.

get_events(close, t_events, pt_sl, target, num_threads, vertical_barrier_times=False, side_prediction=None, verbose=True, **kwargs)

Generate a DataFrame of events based on the triple barrier method.

The DataFrame contains the upper and lower boundaries based on the target series multiplied by the profit taking and stop loss levels specified. Optionally, a vertical barrier could also be added. The DataFrame contains ‘t1’, which is the time when the first barriers were reached.

Parameters:
  • close – (pd.Series) Close prices.

  • t_events – (pd.Series) of t_events. These are timestamps that will seed every triple barrier. These are the timestamps selected by the sampling procedures discussed in Chapter 2, Section 2.5. Eg: CUSUM Filter.

  • pt_sl – (2 element array) Element 0, indicates the profit taking level; Element 1 is stop loss level. A non-negative float that sets the width of the two barriers. A 0 value means that the respective horizontal barrier (profit taking and/or stop loss) will be disabled.

  • target – (pd.Series) of absolute values that are used (in conjunction with pt_sl) to determine the width of the barrier. One can for example use a volatility series to determine the width.

  • num_threads – (int) The number of threads concurrently used by the function.

  • vertical_barrier_times – (pd.Series) A pandas series with the timestamps of the vertical barriers. We pass a False when we want to disable vertical barriers.

  • side_prediction – (pd.Series) Side of the bet (long/short) as decided by the primary model

  • verbose – (bool) Flag to report progress on asynch jobs.

Returns:

(pd.DataFrame) DataFrame consisting of the columns t1, trgt, side, pt, and sl.

  • Index - Event’s starttime.

  • t1 - Time when the first barriers were reached.

  • trgt - The event’s target.

  • side - (optional) Implies the algo’s position side.

  • pt - Profit taking multiple.

  • sl - Stop loss multiple.

References:

  • Advances in Financial Machine Learning, Snippet 3.6 page 50.

get_bins(triple_barrier_events, close)

Provides the labels based on the triple-barrier method.

The top horizontal barrier, bottom horizontal barrier and vertical barrier is labeled respectively as {1,-1,0}. Meta-labeling can also be used where we give the ‘side’ of the trade. The possible values in this case is then {0,1}. The ML algorithm will be trained to decide is 1, we can use the probability of this secondary prediction to derive the size of the bet, where the side (sign) of the position has been set by the primary model.

Parameters:
  • triple_barrier_events

    (pd.DataFrame) DataFrame consisting of the columns t1, trgt, side.

    • Index - Event’s starttime.

    • t1 - Time when the first barriers were reached.

    • trgt - The event’s target.

    • side - (optional) Implies the algo’s position side.

    Case 1: (‘side’ not in events): bin in (-1,0,1) <-label by price action. Case 2: (‘side’ in events): bin in (0,1) <-label by pnl (meta-labeling).

  • close – (pd.Series) Close prices.

Returns:

(pd.DataFrame) Meta-labeled events.

References:

  • Advances in Financial Machine Learning, Snippet 3.7, page 51.

drop_labels(events, min_pct=0.05)

This function recursively eliminates rare observations.

Parameters:
  • events – (dp.DataFrame) Events.

  • min_pct – (float) A fraction used to decide if the observation occurs less than that fraction.

Returns:

(pd.DataFrame) Events.

References:
  • Advances in Financial Machine Learning, Snippet 3.8 page 54.


Example

To illustrate an example of how we might apply the triple barrier method, we first calculate the volatility of our series that will be used as the dynamic threshold for the barriers. The barriers then become the volatility of the series multiplied by the profit taking and stop loss levels at each point in the series. We also specify the a vertical barrier in this case, which could be units of time, or the number of bars.

import yfinance as yf

# Import MlFinLab tools
from mlfinlab.util import volatility
from mlfinlab.filters import filters
from mlfinlab.labeling import labeling

data = yf.download(tickers="SPY", period="7d", interval="1m")["Adj Close"]

# calculate the volatility that will be used to dynamically set the barriers
vol = volatility.get_daily_vol(close=data, lookback=50)

# Apply Symmetric CUSUM Filter and get timestamps for events
# Note: Only the CUSUM filter needs a point estimate for volatility
cusum_events = filters.cusum_filter(data, threshold=vol.mean())

# Compute vertical barrier using timedelta
vertical_barriers = labeling.add_vertical_barrier(
    t_events=cusum_events, close=data, num_hours=1
)

# Another option is to compute the vertical bars after a fixed number of samples
vertical_barriers = labeling.add_vertical_barrier(
    t_events=cusum_events, close=data, num_bars=60
)

We can now apply the triple barrier method using the CUSUM filtered events (the entire dataframe could also be used, in which case we would set t_events = data.index), the volatility series, and the vertical barrier. All we need to do in addition is specify our profit-taking and stop-loss levels, which will be multiplied by the volatility series to set the upper and lower barriers.

# set the profit taking and stop loss levels
pt_sl = [1, 2]

triple_barrier_events = labeling.get_events(
    close=data,
    t_events=cusum_events,
    pt_sl=pt_sl,
    target=vol,
    num_threads=3,
    vertical_barrier_times=vertical_barriers,
)

# get the triple-barrier labels
labels = labeling.get_bins(triple_barrier_events, data)
print(labels["bin"].value_counts())

Warning

The biggest mistake we see users making here is that they change the daily targets values to get more observations, since ML models require a fair amount of data. This is the wrong approach!

The reason for this is that doing so ignores the fundamental principle of data quality over data quantity in ML. While it is true that ML models benefit from having a sufficient amount of diverse and representative data, simply increasing the quantity of data without considering its quality can be counterproductive.

Please visit the Seven-Point Protocol under the Backtest Overfitting Tools section to learn more about how to think about features and outcomes.


Research Notebook

The following example notebook can be used to better understand the triple-barrier method. It answers the questions in Advances in Financial Machine Learning Chapter 3

  • Chapter 3 Labeling


Presentation Slides

Advances in Financial Machine Learning: Lecture 3/10 (seminar slides)

side_size.jpg

Note

  • pg 12-16: Labeling Techniques

lecture_32.png

Labeling - Hudson & Thames presentation



References