Filters

Filters are a valuable feature in financial machine learning that enable users and researchers to selectively pinpoint and examine particular trends. They function by using a specific conditional trigger, allowing researchers to concentrate on the most important market data and disregard irrelevant noise that would otherwise skew the analysis.

A commonly used filter is the structural break filter, which identifies major shifts in market behavior. It operates by detecting instances when a structural break takes place and excluding other events. This filter is especially helpful in recognizing moments when momentum changes direction, signaling a new market trend.

After filtering events, we can apply a labeling technique to them. This helps users accurately assess their trading strategies by focusing on specific events with a high probability of positive returns. With this labeling, users can also potentially pinpoint market events that may result in significant losses, allowing them to adjust their strategies accordingly.

The main concept is that attempting to label every trading day is not practical, nor effective. As a result, filters play a crucial role in financial machine learning, as they help avoid common overfitting issues. By concentrating on more relevant information, filters lead to more accurate and profitable trading strategies.

Tip

If you focus on forecasting the direction of the next days move using daily OHLC data, for each and every day, then you have an ultra high likelihood of failure.
You need to put a lot of attention on what features will be informative. Which features contain relevant information to help the model in forecasting the target variable.
We have never seen the use of price data (alone) with technical indicators, work in forecasting the next days direction.

Note

Underlying Literature

The following sources elaborate extensively on the topic:

Advances in Financial Machine Learning, Chapter 17 by Marcos Lopez de Prado.

CUSUM Filter

The CUSUM filter is a quality-control method, designed to detect a shift in the mean value of a measured quantity away from a target value. The filter is set up to identify a sequence of upside or downside divergences from any reset level zero. We sample a bar t if and only if S_t >= threshold, at which point S_t is reset to 0.

One practical aspect that makes CUSUM filters appealing is that multiple events are not triggered by raw_time_series hovering around a threshold level, which is a flaw suffered by popular market signals such as Bollinger Bands. It will require a full run of length threshold for raw_time_series to trigger an event.

Once we have obtained this subset of event-driven bars, we will let the ML algorithm determine whether the occurrence of such events constitutes actionable intelligence. Below is an implementation of the Symmetric CUSUM filter.

CUSUM sampling of a price series (de Prado, 2018)

Implementation

cusum_filter(close_prices, threshold, time_stamps=True)

Advances in Financial Machine Learning, Snippet 2.4, page 39.

The Symmetric Dynamic/Fixed CUSUM Filter.

Note: As per the book this filter is applied to closing prices but we extended it to also work on other time series such as volatility.

Parameters:

close_prices – (pd.Series) Close prices (or other time series, e.g. volatility).
threshold – (float/pd.Series) When the abs(change) is larger than the threshold, the function captures it as an event, can be dynamic if threshold is pd.Series.
time_stamps – (bool) Default is to return a DateTimeIndex, change to false to have it return a list.

Returns:

(datetime index vector) Vector of datetimes when the events occurred. This is used later to sample.

Example

An example showing how the CUSUM filter can be used to downsample a time series of close prices can be seen below:

                          >>> import pandas as pd
>>> from mlfinlab.filters.filters import cusum_filter
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/dollar_bars.csv"
>>> data = pd.read_csv(url, index_col="date_time")
>>> data.index = pd.to_datetime(data.index)
>>> cusum_events = cusum_filter(data["close"], threshold=0.05)
>>> cusum_events.values  
array(...)

                        

Z-Score Filter

The Z-Score filter is a method used to define explosive/peak/troughs points in a time series.

The Z-score filter takes as arguments the simple rolling moving average, simple rolling moving standard deviation, Z-score(threshold), and an influence parameter. When the absolute value of the difference between the current time series data point and the rolling average exceeds the threshold times the rolling standard deviation, an event is triggered, i.e. when

\[|Point - RollingMean| > Zscore * RollingSTD\]

The influence parameter determines the influence of events on the algorithm’s detection threshold. If put at 0, detected events have no influence on the threshold, meaning that future events are detected based on a threshold that is calculated with a mean and standard deviation not influenced by past detected events. If it is 0.5, signals have half the influence of normal (non-signal) data points. And, if it is 1, the method will perform just the standard Z-score filter, with all points having an influence on the rolling mean and standard deviation.

Implementation

z_score_filter(raw_time_series, mean_window, std_window, z_score=3, time_stamps=True, influence=1)

Filter which implements z_score filter (https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data)

Parameters:

raw_time_series – (pd.Series) Close prices (or other time series, e.g. volatility).
mean_window – (int): Rolling mean window.
std_window – (int): Rolling std window.
z_score – (float): Number of standard deviations to trigger the event.
influence – (float) The influence parameter determines to which extent the previous detected events influence the rolling mean and standard deviation. If the value is 0, the event has no influence on the rolling mean and standard deviation. At 0.5, it has a half influence of a normal datapoint. At 1, the method operates as a standard Z-Score filter. This value should be in a range of 0 to 1.

Returns:

(datetime index vector) Vector of datetimes when the events occurred. This is used later to sample.

Example

An example of how the Z-score filter can be used to downsample a time series:

                          >>> import pandas as pd
>>> from mlfinlab.filters.filters import z_score_filter
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/dollar_bars.csv"
>>> data = pd.read_csv(url, index_col="date_time")
>>> data.index = pd.to_datetime(data.index)
>>> # Events with Z-Score of influence 0.5
>>> z_score_events_inf = z_score_filter(
...     data["close"], mean_window=100, std_window=100, z_score=2, influence=0.5
... )

>>> # Events from a standard Z-Score filter
>>> z_score_events = z_score_filter(
...     data["close"], mean_window=100, std_window=100, z_score=2
... )

                        

Presentation Slides

References

de Prado, M.L., 2018. Advances in financial machine learning. John Wiley & Sons.