Sample Weights
MlFinLab supports two methods of applying sample weights. The first is weighting an observation based on its given return as well as average uniqueness. The second is weighting an observation based on a time decay.
Implementations
By Returns and Average Uniqueness
The following function utilizes a samples average uniqueness and its return to compute sample weights:
- get_weights_by_return(triple_barrier_events, close_series, num_threads=5, verbose=True)
-
Advances in Financial Machine Learning, Snippet 4.10(part 2), page 69.
Determination of Sample Weight by Absolute Return Attribution
This function is orchestrator for generating sample weights based on return using mp_pandas_obj.
- Parameters:
-
-
triple_barrier_events – (pd.DataFrame) Events from labeling.get_events().
-
close_series – (pd.Series) Close prices.
-
num_threads – (int) The number of threads concurrently used by the function.
-
verbose – (bool) Flag to report progress on asynch jobs.
-
- Returns:
-
(pd.Series) Sample weights based on number return and concurrency.
Example
This function can be utilized as shown below assuming we have already found our barrier events
import pandas as pd
import numpy as np
from mlfinlab.sample_weights.attribution import get_weights_by_return
barrier_events = pd.read_csv('FILE_PATH', index_col=0, parse_dates=[0,2])
close_prices = pd.read_csv('FILE_PATH', index_col=0, parse_dates=[0,2])
sample_weights = get_weights_by_return(barrier_events, close_prices.close,
num_threads=3)
By Time Decay
The following function assigns sample weights using a time decay factor
- get_weights_by_time_decay(triple_barrier_events, close_series, num_threads=5, decay=1, verbose=True)
-
Advances in Financial Machine Learning, Snippet 4.11, page 70.
Implementation of Time Decay Factors.
- Parameters:
-
-
triple_barrier_events – (pd.DataFrame) Events from labeling.get_events().
-
close_series – (pd.Series) Close prices.
-
num_threads – (int) The number of threads concurrently used by the function.
-
decay – (int) Decay factor - decay = 1 means there is no time decay; - 0 < decay < 1 means that weights decay linearly over time, but every observation still receives a strictly positive weight, regadless of how old; - decay = 0 means that weights converge linearly to zero, as they become older; - decay < 0 means that the old portion c of the observations receive zero weight (i.e they are erased from memory).
-
verbose – (bool) Flag to report progress on asynch jobs.
-
- Returns:
-
(pd.Series) Sample weights based on time decay factors.
Example
This function can be utilized as shown below assuming we have already found our barrier events
>>> # Import packages
>>> import pandas as pd
>>> # Import MlFinLab tools
>>> from mlfinlab.util import volatility
>>> from mlfinlab.labeling import labeling
>>> from mlfinlab.sample_weights.attribution import get_weights_by_return
>>> # Load data
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/sample_dollar_bars.csv"
>>> close_prices = pd.read_csv(url, index_col=0, parse_dates=[0])["close"]
>>> # Calculate the volatility that will be used to dynamically set the barriers
>>> vol = volatility.get_daily_vol(close=close_prices, lookback=50)
>>> # Compute vertical barrier using timedelta
>>> vertical_barriers = labeling.add_vertical_barrier(
... t_events=close_prices.index, close=close_prices, num_hours=1
... )
>>> # Set profit taking and stop loss levels
>>> pt_sl = [1, 2]
>>> triple_barrier_events = labeling.get_events(
... close=close_prices,
... t_events=close_prices.index,
... pt_sl=pt_sl,
... target=vol,
... num_threads=3,
... vertical_barrier_times=vertical_barriers,
... )
>>> # Calculate average uniqueness
>>> sample_weights = get_weights_by_return(
... triple_barrier_events.dropna(), close_prices, num_threads=3
... )
>>> print(sample_weights[:4])
2011-08-02 06:46:46.576 6.985295
2011-08-02 07:31:03.237 1.453463
2011-08-02 09:07:37.276 0.264688
2011-08-02 10:52:48.191 0.264480
dtype: float64
Research Notebook
The following research notebooks can be used to better understand the previously discussed sampling methods
Note
This is the same notebook as seen in the Sample Uniqueness docs.