Sample Weights


MlFinLab supports two methods of applying sample weights. The first is weighting an observation based on its given return as well as average uniqueness. The second is weighting an observation based on a time decay.


Implementations

By Returns and Average Uniqueness

The following function utilizes a samples average uniqueness and its return to compute sample weights:

get_weights_by_return(triple_barrier_events, close_series, num_threads=5, verbose=True)

Advances in Financial Machine Learning, Snippet 4.10(part 2), page 69.

Determination of Sample Weight by Absolute Return Attribution

This function is orchestrator for generating sample weights based on return using mp_pandas_obj.

Parameters:
  • triple_barrier_events – (pd.DataFrame) Events from labeling.get_events().

  • close_series – (pd.Series) Close prices.

  • num_threads – (int) The number of threads concurrently used by the function.

  • verbose – (bool) Flag to report progress on asynch jobs.

Returns:

(pd.Series) Sample weights based on number return and concurrency.

Example

This function can be utilized as shown below assuming we have already found our barrier events

import pandas as pd
import numpy as np
from mlfinlab.sample_weights.attribution import get_weights_by_return

barrier_events = pd.read_csv('FILE_PATH', index_col=0, parse_dates=[0,2])
close_prices = pd.read_csv('FILE_PATH', index_col=0, parse_dates=[0,2])


sample_weights = get_weights_by_return(barrier_events, close_prices.close,
                                       num_threads=3)

By Time Decay

The following function assigns sample weights using a time decay factor

get_weights_by_time_decay(triple_barrier_events, close_series, num_threads=5, decay=1, verbose=True)

Advances in Financial Machine Learning, Snippet 4.11, page 70.

Implementation of Time Decay Factors.

Parameters:
  • triple_barrier_events – (pd.DataFrame) Events from labeling.get_events().

  • close_series – (pd.Series) Close prices.

  • num_threads – (int) The number of threads concurrently used by the function.

  • decay – (int) Decay factor - decay = 1 means there is no time decay; - 0 < decay < 1 means that weights decay linearly over time, but every observation still receives a strictly positive weight, regadless of how old; - decay = 0 means that weights converge linearly to zero, as they become older; - decay < 0 means that the old portion c of the observations receive zero weight (i.e they are erased from memory).

  • verbose – (bool) Flag to report progress on asynch jobs.

Returns:

(pd.Series) Sample weights based on time decay factors.

Example

This function can be utilized as shown below assuming we have already found our barrier events

>>> # Import packages
>>> import pandas as pd
>>> # Import MlFinLab tools
>>> from mlfinlab.util import volatility
>>> from mlfinlab.labeling import labeling
>>> from mlfinlab.sample_weights.attribution import get_weights_by_return
>>> # Load data
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/sample_dollar_bars.csv"
>>> close_prices = pd.read_csv(url, index_col=0, parse_dates=[0])["close"]

>>> # Calculate the volatility that will be used to dynamically set the barriers
>>> vol = volatility.get_daily_vol(close=close_prices, lookback=50)
>>> # Compute vertical barrier using timedelta
>>> vertical_barriers = labeling.add_vertical_barrier(
...     t_events=close_prices.index, close=close_prices, num_hours=1
... )
>>> # Set profit taking and stop loss levels
>>> pt_sl = [1, 2]
>>> triple_barrier_events = labeling.get_events(
...     close=close_prices,
...     t_events=close_prices.index,
...     pt_sl=pt_sl,
...     target=vol,
...     num_threads=3,
...     vertical_barrier_times=vertical_barriers,
... )
>>> # Calculate average uniqueness
>>> sample_weights = get_weights_by_return(
...     triple_barrier_events.dropna(), close_prices, num_threads=3
... )
>>> print(sample_weights[:4])
2011-08-02 06:46:46.576    6.985295
2011-08-02 07:31:03.237    1.453463
2011-08-02 09:07:37.276    0.264688
2011-08-02 10:52:48.191    0.264480
dtype: float64

Research Notebook

The following research notebooks can be used to better understand the previously discussed sampling methods

Note

This is the same notebook as seen in the Sample Uniqueness docs.


References