Backtest Statistics

The Backtest Statistics module contains functions related to characteristic analysis of returns and target positions. These include:

Sharpe ratios (annualised, probabilistic, deflated).
Information ratio.
Minimum Required Track Record Length.
Concentration of bets for positive and negative returns.
Drawdown & Time Under Water.
Average holding period from a series of positions.
Filtering flips and flattenings from a series of returns.

Note

Underlying Literature

The following sources elaborate extensively on the topic:

Advances in Financial Machine Learning, Chapter 14 & 15 by Marcos Lopez de Prado.
The Sharpe Ratio Efficient Frontier by David H. Bailey and Marcos Lopez de Prado. Provides a deeper understanding of Sharpe ratios implemented and Minimum track record length.

Additionally we have implemented the award winning framework by Campbell R Harvey and Yan Liu. In particular the Haircut Sharpe ratio and Profit Hurdle algorithms.

The following sources elaborate further:

Backtesting by Campbell R. Harvey and Yan Liu. The paper provides a deeper understanding of the Haircut Sharpe ratio and Profit Hurdle algorithms. The code in this module is based on the code written by the researchers.
… and the Cross-section of Expected Returns. by Harvey, C.R., Y. Liu, and H. Zhu. Describes a structural model to capture trading strategies’ underlying distribution, referred to as the HLZ model.
The Statistics of Sharpe Ratios. by Lo, A. Gives a broader understanding of Sharpe ratio adjustments to autocorrelation and different time periods

Annualized Sharpe Ratio

Calculates Annualized Sharpe Ratio for pd.Series of normal or log returns.

A usual metric of returns in relation to risk. Also takes into account number of return entries per year and risk-free rate. Risk-free rate should be given for the same period the returns are given. For example, if the input returns are observed in 3 months, the risk-free rate given should be the 3-month risk-free rate.

Calculated as:

\[SharpeRatio = \frac{E[Returns] - RiskFreeRate}{\sqrt{V[Returns]}} * \sqrt{n}\]

Generally, the higher Sharpe Ratio is, the better.

Implementation

sharpe_ratio(returns: Series, entries_per_year: int = 252, risk_free_rate: float = 0) → float

Calculates annualized Sharpe ratio for pd.Series of normal or log returns.

Risk_free_rate should be given for the same period the returns are given. For example, if the input returns are observed in 3 months, the risk-free rate given should be the 3-month risk-free rate.

Parameters:

returns – (pd.Series) Returns - normal or log.
entries_per_year – (int) Times returns are recorded per year (252 by default).
risk_free_rate – (float) Risk-free rate (0 by default).

Returns:

(float) Annualized Sharpe ratio.

Example

An example showing how Annualized Sharpe Ratio function is used with monthly cumulative returns data:

                          >>> import pandas as pd
>>> from mlfinlab.backtest_statistics.statistics import sharpe_ratio
>>> # Read in the returns data for ticker AAPL, Apple Inc
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/logReturns.csv"
>>> returns = pd.read_csv(url)["AAPL"]
>>> # Calculate the Sharpe ratio for our returns data
>>> sharpe_r = sharpe_ratio(returns, entries_per_year=252)
>>> sharpe_r  
1.06...

                        

Haircut Sharpe Ratio

Adjusts the Sharpe Ratio due to multiple testing.

This algorithm lets the user calculate the Sharpe ratio adjustments and the corresponding haircuts based on the key parameters of the data used in the strategy backtests. For each of the adjustment methods - Bonferroni, Holm, BHY (Benjamini, Hochberg, and Yekutieli) and the Average the algorithm calculates an adjusted p-value, haircut Sharpe ratio, and the haircut.

The haircut is the percentage difference between the original Sharpe ratio and the new Sharpe ratio.

The inputs of the method include information about the returns that were used to calculate the observed Sharpe ratio. In particular:

At what frequency were the returns observed.
The number of returns observed.
Observed Sharpe ratio.
Information on if an observed Sharpe ratio is annualized and if it’s adjusted to the autocorrelation of returns (described in the paper by Lo, A.).
Autocorrelation coefficient of returns.
The number of tests in multiple testing allowed (described in the first two papers from the introduction).
Average correlation among strategy returns.

Adjustment methods include:

Bonferroni
Holm
Benjamini, Hochberg, and Yekutieli (BHY)
Average of the methods above

The method returns np.array of adjusted p-values, adjusted Sharpe ratios, and haircuts as rows. Elements in a row are ordered by adjustment methods in the following way [Bonferroni, Holm, BHY, Average].

Haircut Sharpe Ratio algorithm consists of the following steps:

We are given the observed Sharpe ratio \(SR\) in \(T\) periods, based on this information we can calculate the p-value of a single test \(p^S\).
Assuming that \(N\) other strategies have been tried and that the average correlation of returns from the strategies is \(\rho\) , we use the HLZ model to generate \(N\) number of t-statistics from the model. We also transform the calculated \(p^S\) to a t-statistic.
This \(N+1\) t-statistics are transformed again to p-values, taking into account the data mining adjustment.
This set of \(N+1\) p-values are fed to two models described above (Holm and BHY) to get the adjusted p-values with each of the methods. (Bonferroni adjustment is calculated using only the \(p^S\) and \(N\))
The steps 2-4 are repeated multiple times (simulations).
For each of the two methods, we eventually have a set of \(p^M\) values adjusted. The median of this set is the final adjusted p-value of the method. So, we obtained p-values for each of the three methods. We then calculate the average p-value as the Average of the methods.
The obtained p-values of each method can be then transformed back to Sharpe ratios and the haircuts can be calculated.

Implementation

class CampbellBacktesting(simulations=2000)

This class implements the Haircut Sharpe Ratios and Profit Hurdles algorithms described in the following paper: Campbell R. Harvey and Yan Liu, Backtesting, (Fall 2015). Journal of Portfolio Management, 2015; The code is based on the code provided by the authors of the paper.

The Haircut Sharpe Ratios algorithm lets the user adjust the observed Sharpe Ratios to take multiple testing into account and calculate the corresponding haircuts. The haircut is the percentage difference between the original Sharpe ratio and the new Sharpe ratio.

The Profit Hurdle algorithm lets the user calculate the required mean return for a strategy at a given level of significance, taking multiple testing into account.

__init__(simulations=2000)

Set the desired number of simulations to make in Haircut Sharpe Ratios or Profit Hurdle algorithms.

Parameters:: simulations – (int) Number of simulations.

haircut_sharpe_ratios(sampling_frequency, num_obs, sharpe_ratio, annualized, autocorr_adjusted, rho_a, num_mult_test, rho)

Calculates the adjusted Sharpe ratio due to testing multiplicity.

This algorithm lets the user calculate Sharpe ratio adjustments and the corresponding haircuts based on the key parameters of returns from the strategy. The adjustment methods are Bonferroni, Holm, BHY (Benjamini, Hochberg and Yekutieli) and the Average of them. The algorithm calculates adjusted p-value, adjusted Sharpe ratio and the haircut.

The haircut is the percentage difference between the original Sharpe ratio and the new Sharpe ratio.

Parameters:

sampling_frequency – (str) Sampling frequency [‘D’,’W’,’M’,’Q’,’A’] of returns.
num_obs – (int) Number of returns in the frequency specified in the previous step.
sharpe_ratio – (float) Sharpe ratio of the strategy. Either annualized or in the frequency specified in the previous step.
annualized – (bool) Flag if Sharpe ratio is annualized.
autocorr_adjusted – (bool) Flag if Sharpe ratio was adjusted for returns autocorrelation.
rho_a – (float) Autocorrelation coefficient of returns at the specified frequency (if the Sharpe ratio wasn’t corrected).
num_mult_test – (int) Number of other strategies tested (multiple tests).
rho – (float) Average correlation among returns of strategies tested.

Returns:

(np.array) Array with adjuted p-value, adjusted Sharpe ratio, and haircut as rows for Bonferroni, Holm, BHY and average adjustment as columns.

Example

An example showing how Haircut Sharpe Ratios method is used can be seen below:

                          >>> from mlfinlab.backtest_statistics.backtests import CampbellBacktesting
>>> # Specify the desired number of simulations
>>> backtesting = CampbellBacktesting(4000)
>>> # In this example, annualized Sharpe ratio of 1, not adjusted to autocorrelation of returns
>>> # at 0.1, calculated on monthly observations of returns for two years (24 total observations),
>>> # with 10 multiple testing and average correlation among returns of 0.4
>>> haircuts = backtesting.haircut_sharpe_ratios(
...     sampling_frequency="M",
...     num_obs=24,
...     sharpe_ratio=1,
...     annualized=True,
...     autocorr_adjusted=False,
...     rho_a=0.1,
...     num_mult_test=10,
...     rho=0.4,
... )
>>> # Adjusted Sharpe ratios by method used
>>> sr_adj_bonferroni = haircuts[1][0]
>>> sr_adj_bonferroni  
4.74...
>>> sr_adj_holm = haircuts[1][1]
>>> sr_adj_holm  
4.74...
>>> sr_adj_bhy = haircuts[1][2]
>>> round(sr_adj_bhy, 1)  
0.1...
>>> sr_adj_average = haircuts[1][3]
>>> round(sr_adj_average, 2)  
0.03...

                        

Probabilistic Sharpe Ratio

Calculates the probabilistic Sharpe ratio (PSR) that provides an adjusted estimate of SR, by removing the inflationary effect caused by short series with skewed and/or fat-tailed returns.

Given a user-defined benchmark Sharpe ratio and an observed Sharpe ratio, PSR estimates the probability that SR ̂is greater than a hypothetical SR.

If PSR exceeds 0.95, then SR is higher than the hypothetical (benchmark) SR at the standard significance level of 5%.

Formula for calculation:

\[PSR[SR^{*}] = Z[\frac{(SR - SR^{*})\sqrt{T-1}}{\sqrt{1-\gamma_3*SR+\frac{\gamma_{4}-1}{4}*SR^2}}]\]

Where:

\(SR^{*}\) - benchmark Sharpe ratio

\(SR\) - estimate od Sharpe ratio

\(Z[..]\) - cumulative distribution function (CDF) of the standard Normal distribution

\(T\) - number of observed returns

\(\gamma_3\) - skewness of the returns

\(\gamma_4\) - kurtosis of the returns

Implementation

probabilistic_sharpe_ratio(observed_sr: float, benchmark_sr: float, number_of_returns: int, skewness_of_returns: float = 0, kurtosis_of_returns: float = 3) → float

Calculates the probabilistic Sharpe ratio (PSR) that provides an adjusted estimate of SR, by removing the inflationary effect caused by short series with skewed and/or fat-tailed returns.

Given a user-defined benchmark Sharpe ratio and an observed Sharpe ratio, PSR estimates the probability that SR ̂is greater than a hypothetical SR. - It should exceed 0.95, for the standard significance level of 5%. - It can be computed on absolute or relative returns.

Parameters:

observed_sr – (float) Sharpe ratio that is observed.
benchmark_sr – (float) Sharpe ratio to which observed_SR is tested against.
number_of_returns – (int) Times returns are recorded for observed_SR.
skewness_of_returns – (float) Skewness of returns (0 by default).
kurtosis_of_returns – (float) Kurtosis of returns (3 by default).

Returns:

(float) Probabilistic Sharpe ratio.

Example

An example showing how Probabilistic Sharpe Ratio function is used with an example of data with normal returns:

                          >>> from mlfinlab.backtest_statistics.statistics import probabilistic_sharpe_ratio
>>> psr = probabilistic_sharpe_ratio(1.2, 1.0, 200)
>>> psr  
0.98...

                        

Deflated Sharpe Ratio

Calculates the deflated Sharpe ratio (DSR) - a PSR where the rejection threshold is adjusted to reflect the multiplicity of trials. DSR is estimated as PSR[SR∗], where the benchmark Sharpe ratio, SR∗, is no longer user-defined, but calculated from SR estimate trails.

DSR corrects SR for inflationary effects caused by non-Normal returns, track record length, and multiple testing/selection bias.

Given a user-defined benchmark Sharpe ratio and an observed Sharpe estimates (or their properties - standard deviations and number of trails), DSR estimates the probability that SR is greater than a hypothetical SR. Allows the output of the hypothetical (benchmark) SR.

If DSR exceeds 0.95, then SR is higher than the hypothetical (benchmark) SR at the standard significance level of 5%.

Hypothetical SR is calculated as:

\[SR^{*} = \sqrt{V[\{SR_{n}\}]}((1-\gamma)*Z^{-1}[1-\frac{1}{N}+\gamma*Z^{-1}[1-\frac{1}{N}*e^{-1}]\]

Where:

\(SR^{*}\) - benchmark Sharpe ratio

\(\{SR_{n}\}\) - trails of SR estimates

\(Z[..]\) - cumulative distribution function (CDF) of the standard Normal distribution

\(N\) - number of SR trails

\(\gamma\) - Euler-Mascheroni constant

\(e\) - Euler constant

Implementation

deflated_sharpe_ratio(observed_sr: float, sr_estimates: list, number_of_returns: int, skewness_of_returns: float = 0, kurtosis_of_returns: float = 3, estimates_param: bool = False, benchmark_out: bool = False) → float

Calculates the deflated Sharpe ratio (DSR) - a PSR where the rejection threshold is adjusted to reflect the multiplicity of trials. DSR is estimated as PSR[SR∗], where the benchmark Sharpe ratio, SR∗, is no longer user-defined, but calculated from SR estimate trails.

DSR corrects SR for inflationary effects caused by non-Normal returns, track record length, and multiple testing/selection bias. - It should exceed 0.95, for the standard significance level of 5%. - It can be computed on absolute or relative returns.

Function allows the calculated SR benchmark output and usage of only standard deviation and number of SR trails instead of full list of trails.

Parameters:

observed_sr – (float) Sharpe ratio that is being tested.
sr_estimates – (list) Sharpe ratios estimates trials list or properties list: [Standard deviation of estimates, Number of estimates] if estimates_param flag is set to True.
number_of_returns – (int) Times returns are recorded for observed_SR.
skewness_of_returns – (float) Skewness of returns (0 by default).
kurtosis_of_returns – (float) Kurtosis of returns (3 by default).
estimates_param – (bool) Flag to use properties of estimates instead of full list.
benchmark_out – (bool) Flag to output the calculated benchmark instead of DSR.

Returns:

(float) Deflated Sharpe ratio or Benchmark SR (if benchmark_out).

Example

An example showing how Deflated Sharpe Ratio function with list of SR estimates as well as properties of SR estimates and benchmark output:

                          >>> from mlfinlab.backtest_statistics.statistics import deflated_sharpe_ratio
>>> dsr = deflated_sharpe_ratio(1.2, [1.0, 1.1, 1.0], 200)
>>> dsr
1.0

                        

Information Ratio

Calculates Annualized Information Ratio for a given pandas Series of normal or log returns.

It is the annualized ratio between the average excess return and the tracking error. The excess return is measured as the portfolio’s return in excess of the benchmark’s return. The tracking error is estimated as the standard deviation of the excess returns.

Benchmark should be provided as a return for the same time period as that between input returns. For example, for the daily observations it should be the benchmark of daily returns.

Calculated as:

\[InformationRatio = \frac{E[Returns - Benchmark]}{\sqrt{V[Returns - Benchmark]}} * \sqrt{n}\]

Implementation

information_ratio(returns: Series, benchmark: float = 0, entries_per_year: int = 252) → float

Calculates annualized information ratio for pd.Series of normal or log returns.

Benchmark should be provided as a return for the same time period as that between input returns. For example, for the daily observations it should be the benchmark of daily returns.

It is the annualized ratio between the average excess return and the tracking error. The excess return is measured as the portfolio’s return in excess of the benchmark’s return. The tracking error is estimated as the standard deviation of the excess returns.

Parameters:

returns – (pd.Series) Returns - normal or log.
benchmark – (float) Benchmark for performance comparison (0 by default).
entries_per_year – (int) Times returns are recorded per year (252 by default).

Returns:

(float) Annualized information ratio.

Example

An example showing how Annualized Information Ratio function is used with monthly cumulative returns data:

                          >>> from mlfinlab.backtest_statistics.statistics import information_ratio
>>> information_r = information_ratio(returns, benchmark=0.005, entries_per_year=12)
>>> information_r  
-0.6...

                        

Profit Hurdle

This algorithm calculates the Required Mean Return of a strategy at a given level of significance adjusted due to multiple testing.

The method described below works only with characteristics of monthly returns that have no autocorrelation.

The inputs of the method include information about returns data. In particular:

The number of tests in multiple testing allowed (described in the first two papers from the introduction).
Number of monthly returns observed.
Significance level.
Annual return volatility.
Average correlation among strategy returns.

Adjustment methods include:

Bonferroni
Holm
Benjamini, Hochberg, and Yekutieli (BHY)
Average of the methods above

Profit Hurdle algorithm consists of the following steps:

We are given the significance level \(p\), strategy volatility \(\sigma\), the number of observations \(T\) , and the number of tests that have been concluded \(T\) .
Using the HLZ model, we generate \(N\) t-statistics assuming that the average correlation of returns is \(\rho\) .
Using two methods (Holm and BHY) we calculate the threshold t-statistic that matches the \(p\) significance level.
The steps 2-3 are repeated multiple times (simulations).
For the two methods (Holm and BHY) we have a set of t-statistics. We then take the median of t-statistics in each set and call it a t-statistic for the method. T-ststistic for Bonferroni is calculated based on \(p\) and \(N\), as in the previous algorithm (Haircut Sharpe Ratios).
The obtained t-statistics of each method can be then transformed to mean monthly returns. We then calculate the average mean monthly return as the Average of the methods returns.

Implementation

The method returns np.array of minimum average monthly returns by the method as elements. The order of the elements by method is [Bonferroni, Holm, BHY, Average].

class CampbellBacktesting(simulations=2000)

This class implements the Haircut Sharpe Ratios and Profit Hurdles algorithms described in the following paper: Campbell R. Harvey and Yan Liu, Backtesting, (Fall 2015). Journal of Portfolio Management, 2015; The code is based on the code provided by the authors of the paper.

The Haircut Sharpe Ratios algorithm lets the user adjust the observed Sharpe Ratios to take multiple testing into account and calculate the corresponding haircuts. The haircut is the percentage difference between the original Sharpe ratio and the new Sharpe ratio.

The Profit Hurdle algorithm lets the user calculate the required mean return for a strategy at a given level of significance, taking multiple testing into account.

__init__(simulations=2000)

Set the desired number of simulations to make in Haircut Sharpe Ratios or Profit Hurdle algorithms.

Parameters:: simulations – (int) Number of simulations.

profit_hurdle(num_mult_test, num_obs, alpha_sig, vol_anu, rho)

Calculates the required mean monthly return for a strategy at a given level of significance.

This algorithm uses four adjustment methods - Bonferroni, Holm, BHY (Benjamini, Hochberg and Yekutieli) and the Average of them. The result is the Minimum Average Monthly Return for the strategy to be significant at a given significance level, taking into account multiple testing.

This function doesn’t allow for any autocorrelation in the strategy returns.

Parameters:

num_mult_test – (int) Number of tests in multiple testing allowed (number of other strategies tested).
num_obs – (int) Number of monthly observations for a strategy.
alpha_sig – (float) Significance level (e.g., 5%).
vol_anu – (float) Annual volatility of returns(e.g., 0.05 or 5%).
rho – (float) Average correlation among returns of strategies tested.

Returns:

(np.ndarray) Minimum Average Monthly Returns for [Independent tests, Bonferroni, Holm, BHY and Average for Multiple tests].

Example

An example showing how Profit Hurdle method is used can be seen below:

                          >>> from mlfinlab.backtest_statistics.backtests import CampbellBacktesting
>>> # Specify the desired number of simulations
>>> backtesting = CampbellBacktesting(4000)
>>> # In this example, monthly observations of returns for two years (24 total observations),
>>> # with 10 multiple testing, significance level of 5% and 10% annual volatility and average
>>> # correlation among returns of 0.4
>>> monthly_ret = backtesting.profit_hurdle(
...     num_mult_test=10, num_obs=24, alpha_sig=0.05, vol_anu=0.1, rho=0.4
... )
>>> # Minimum Average Monthly Returns by method used
>>> monthly_ret_bonferroni = monthly_ret[0]
>>> monthly_ret_bonferroni  
1.1...
>>> monthly_ret_holm = monthly_ret[1]
>>> monthly_ret_holm  
1.6...
>>> monthly_ret_bhy = monthly_ret[2]
>>> monthly_ret_bhy  
1.2...
>>> monthly_ret_average = monthly_ret[3]
>>> monthly_ret_average  
1.2...

                        

Minimum Track Record Length

Calculates the Minimum Track Record Length - “How long should a track record be in order to have statistical confidence that its Sharpe ratio is above a given threshold?”

If a track record is shorter than MinTRL, we do not have enough confidence that the observed Sharpe ratio ̂is above the designated Sharpe ratio threshold.

MinTRLis expressed in terms of number of observations, not annual or calendar terms.

Minimum Track Record Length is calculated as:

\[MinTRL = 1 + [1-\gamma_3*SR+\frac{\gamma_{4}-1}{4}*SR^2]*(\frac{Z_{\alpha}}{SR-SR^{*}})^2\]

Where:

\(SR^{*}\) - benchmark Sharpe ratio

\(SR\) - estimate od Sharpe ratio

\(Z_{\alpha}\) - Z score of desired significance level

\(\gamma_3\) - skewness of the returns

\(\gamma_4\) - kurtosis of the returns

Implementation

minimum_track_record_length(observed_sr: float, benchmark_sr: float, skewness_of_returns: float = 0, kurtosis_of_returns: float = 3, alpha: float = 0.05) → float

Calculates the minimum track record length (MinTRL) - “How long should a track record be in order to have statistical confidence that its Sharpe ratio is above a given threshold?”

If a track record is shorter than MinTRL, we do not have enough confidence that the observed Sharpe ratio ̂is above the designated Sharpe ratio threshold.

MinTRLis expressed in terms of number of observations, not annual or calendar terms.

Parameters:

observed_sr – (float) Sharpe ratio that is being tested.
benchmark_sr – (float) Sharpe ratio to which observed_SR is tested against.
skewness_of_returns – (float) Skewness of returns (0 by default).
kurtosis_of_returns – (float) Kurtosis of returns (3 by default).
alpha – (float) Desired significance level (0.05 by default).

Returns:

(float) Minimum number of track records.

Example

An example showing how Minimum Track Record Length function is used with an example of data with normal returns:

                          >>> from mlfinlab.backtest_statistics.statistics import minimum_track_record_length
>>> min_record_length = minimum_track_record_length(1.2, 1.0)
>>> min_record_length  
117.3...

                        

Bets Concentration

Concentration of returns measures the uniformity of returns from bets. Metric is inspired by Herfindahl-Hirschman Index and is calculated as follows:

\[Weight_{i} = \frac{Return_{i}}{\sum_{i}Return_{i}}\]

\[SumSquares = \sum_{i}Weight_{i}^2\]

\[HHI = \frac{SumSquares - \frac{1}{i}}{1 - \frac{1}{i}}\]

The closer the concentration is to 0, the more uniform the distribution of returns (When 0, returns are uniform). If the concentration value is close to 1, returns highly concentrated (When 1, only one non-zero return).

Returns \(nan\) if less than 3 returns in series.

Implementation

bets_concentration(returns: Series) → float

Advances in Financial Machine Learning, Snippet 14.3, page 201

Derives the concentration of returns from given pd.Series of returns.

Algorithm is based on Herfindahl-Hirschman Index where return weights are taken as an input.

Parameters:: returns – (pd.Series) Returns from bets.
Returns:: (float) Concentration of returns (nan if less than 3 returns).

Example

An example showing how Bets Concentration function is used can be seen below:

                          >>> from mlfinlab.backtest_statistics.statistics import bets_concentration
>>> concentration = bets_concentration(returns)
>>> concentration  
0.29...

                        

All Bets Concentration

Concentration of returns measures the uniformity of returns from bets. Metric is inspired by Herfindahl-Hirschman Index and is calculated as follows:

\[Weight_{i} = \frac{Return_{i}}{\sum_{i}Return_{i}}\]

\[SumSquares = \sum_{i}Weight_{i}^2\]

\[HHI = \frac{SumSquares - \frac{1}{i}}{1 - \frac{1}{i}}\]

The closer the concentration is to 0, the more uniform the distribution of returns (When 0, returns are uniform). If the concentration is close to 1, returns highly concentrated (When 1, only one non-zero return).

This function calculates concentration separately for positive returns, negative returns and concentration of bets grouped by time intervals (daily, monthly etc.) separately.

If concentration of positive returns is low, there is no right fat tail in returns distribution.
If concentration of negative returns is low, there is no left fat tail in returns distribution.
If after time grouping is less than 2 observations, returns third element as nan.

Implementation

all_bets_concentration(returns: Series, frequency: str = 'M') → tuple

Advances in Financial Machine Learning, Snippet 14.3, page 201

Given a pd.Series of returns, derives concentration of positive returns, negative returns and concentration of bets grouped by time intervals (daily, monthly etc.). If after time grouping less than 3 observations, returns nan.

Properties or results:

low positive_concentration ⇒ no right fat-tail of returns (desirable)
low negative_concentration ⇒ no left fat-tail of returns (desirable)
low time_concentration ⇒ bets are not concentrated in time, or are evenly concentrated (desirable)
positive_concentration == 0 ⇔ returns are uniform
positive_concentration == 1 ⇔ only one non-zero return exists

Parameters:

returns – (pd.Series) Returns from bets.
frequency – (str) Desired time grouping frequency from pd.Grouper.

Returns:

(tuple of floats) Concentration of positive, negative and time grouped concentrations.

Example

An example showing how All Bets Concentration function is used with weekly group data:

                          >>> import pandas as pd
>>> import numpy as np
>>> from mlfinlab.backtest_statistics.statistics import all_bets_concentration
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/sample_dollar_bars.csv"
>>> returns = pd.read_csv(url, index_col="date_time")
>>> returns.index = pd.to_datetime(returns.index)
>>> returns = np.log(returns["close"]).diff()[1:]
>>> pos_concentr, neg_concentr, week_concentr = all_bets_concentration(
...     returns, frequency="D"
... )
>>> pos_concentr  
0.0002...
>>> neg_concentr  
0.0001...
>>> week_concentr  
0.001...

 pos_concentr, neg_concentr, week_concentr = all_bets_concentration(returns, frequency='W')

                        

Drawdown and Time Under Water

Intuitively, a drawdown is the maximum loss suffered by an investment between two consecutive high-watermarks.

The time under water is the time elapsed between a high watermark and the moment the PnL (profit and loss) exceeds the previous maximum PnL.

Input a series of cumulated returns, or account balance. Can be in dollars or other currency, then the function returns the respective drawdowns.

The function returns two series:

Drawdown series index is time of a high watermark and the drawdown value.
Time under water index is time of a high watermark and how much time passed till next high watermark is reached, in years. Also includes time between the last high watermark and last observation in returns as the last Time under water element. Without this element the estimations of Time under water can be biased.

Implementation

drawdown_and_time_under_water(returns: Series, dollars: bool = False) → tuple

Advances in Financial Machine Learning, Snippet 14.4, page 201

Calculates drawdowns and time under water for pd.Series of either relative price of a portfolio or dollar price of a portfolio.

Intuitively, a drawdown is the maximum loss suffered by an investment between two consecutive high-watermarks. The time under water is the time elapsed between an high watermark and the moment the PnL (profit and loss) exceeds the previous maximum PnL. We also append the Time under water series with period from the last high-watermark to the last return observed.

Return details:

Drawdown series index is the time of a high watermark and the value of a drawdown after it.
Time under water index is the time of a high watermark and how much time passed till the next high watermark in years. Also includes time between the last high watermark and last observation in returns as the last element.

Parameters:

returns – (pd.Series) Returns from bets.
dollars – (bool) Flag if given dollar performance and not returns. If dollars, then drawdowns are in dollars, else as a %.

Returns:

(tuple of pd.Series) Series of drawdowns and time under water.

Example

An example showing how Drawdown and Time Under Water function is used with account data in dollars:

                          >>> import pandas as pd
>>> import numpy as np
>>> from mlfinlab.backtest_statistics.statistics import drawdown_and_time_under_water
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/sample_dollar_bars.csv"
>>> returns = pd.read_csv(url, index_col="date_time")
>>> returns.index = pd.to_datetime(returns.index)
>>> returns = np.log(returns["close"]).diff()[1:]
>>> drawdown, tuw = drawdown_and_time_under_water(returns, dollars=True)
>>> drawdown  
date_time
2011-08-01 02:55:17.443    0.004413
2011-08-01 10:51:41.842    0.012220
2011-08-05 07:37:13.880    0.014013
2011-08-05 12:30:19.803    0.038670
2011-08-09 05:52:50.256    0.031204
2011-11-28 00:40:55.625    0.026510
2012-01-03 11:02:25.863    0.028292
dtype: float64

>>> tuw  
date_time
2011-08-01 02:55:17.443    0.000906
2011-08-01 10:51:41.842    0.010589
2011-08-05 07:37:13.880    0.000558
2011-08-05 12:30:19.803    0.010203
2011-08-09 05:52:50.256    0.303516
2011-11-28 00:40:55.625    0.099813
2012-01-03 11:02:25.863    0.572930
dtype: float64

                        

Average Holding Period

Parameters of the algorithm are calculated as follows:

When the size of the position is increasing

Updating EntryTime - time when a trade was opened, adjusted by increases in positions. This takes into account the weight of the position increase.

\[EntTime_{0} = \frac{EntTime_{-1}*Weight_{-1} + TimeSinceTradeStart*(Weight_{0}-Weight_{-1})}{Weight_{0}}\]

When the size of a bet is decreasing.

Capturing the \(HoldingTime = (EntryTime - CurrentTime)\) as well as \(Weight\) of the closed position. If entire position is closed, setting \(EntryTime\) to \(CurrentTime\).

Finally, calculating, using values captured in step 2.

\[AverageHoldingTime = \frac{\sum_{i}(HoldingTime_{i}*Weight_{i})}{\sum_{i}Weight_{i}}\]

If no closed trades in the series, output is \(nan\)

Implementation

average_holding_period(target_positions: Series) → float

Advances in Financial Machine Learning, Snippet 14.2, page 197

Estimates the average holding period (in days) of a strategy, given a pandas series of target positions using average entry time pairing algorithm.

Idea of an algorithm:

entry_time = (previous_time * weight_of_previous_position + time_since_beginning_of_trade * increase_in_position ) / weight_of_current_position
holding_period [‘holding_time’ = time a position was held, ‘weight’ = weight of position closed]
res = weighted average time a trade was held

Parameters:: target_positions – (pd.Series) Target position series with timestamps as indices.
Returns:: (float) Estimated average holding period, NaN if zero or unpredicted.

Example

                          >>> import pandas as pd
>>> import numpy as np
>>> import datetime as dt
>>> from mlfinlab.backtest_statistics.statistics import average_holding_period
>>> hold_positions = np.array([0, 1, 1, -1, -1, 0, 0, 2, 2, 0])
>>> dates = np.array(
...     [dt.datetime(2000, 1, 1) + i * dt.timedelta(days=1) for i in range(10)]
... )
>>> target_positions = pd.Series(data=hold_positions, index=dates)
>>> avg_holding_period = average_holding_period(target_positions)
>>> avg_holding_period
2.0

                        

Flattening and Flips

Points of Flipping: When target position changes sign (For example, changing from 1.5 (long position) to -0.5 (short position) on the next timestamp)

Points of Flattening: When target position changes from nonzero to zero (For example, changing from 1.5 (long position) to 0 (no positions) on the next timestamp)

Implementation

timing_of_flattening_and_flips(target_positions: Series) → DatetimeIndex

Advances in Financial Machine Learning, Snippet 14.1, page 197

Derives the timestamps of flattening or flipping trades from a pandas series of target positions. Can be used for position changes analysis, such as frequency and balance of position changes.

Flattenings - times when open position is bing closed (final target position is 0). Flips - times when positive position is reversed to negative and vice versa.

Parameters:: target_positions – (pd.Series) Target position series with timestamps as indices
Returns:: (pd.DatetimeIndex) Timestamps of trades flattening, flipping and last bet.

Example

An example showing how Flattening and Flips function is used can be seen below:

                          >>> import pandas as pd
>>> import numpy as np
>>> import datetime as dt
>>> from mlfinlab.backtest_statistics.statistics import timing_of_flattening_and_flips
>>> flip_positions = np.array([1.0, 1.5, 0.5, 0, -0.5, -1.0, 0.5, 1.5, 1.5, 1.5])
>>> dates = np.array(
...     [dt.datetime(2000, 1, 1) + i * dt.timedelta(days=1) for i in range(10)]
... )
>>> target_positions = pd.Series(data=flip_positions, index=dates)
>>> flattening_and_flips_timestamps = timing_of_flattening_and_flips(target_positions)
>>> flattening_and_flips_timestamps  
DatetimeIndex...

                        

Research Notebooks

The following research notebooks can be used to better understand how the statistics within this module can be used on real data.

Advances in Financial Machine Learning Chapter 14 Exercise Notebook

Backtesting By Campbell and Liu

Chapter 15 Understanding Strategy Risk

Presentation Slides

Note

pg 14-26: Backtesting Methods

Note

pg 1-17: Backtesting Statistics
pg 18-38: Type I and Type II Errors
pg 39-44: Understanding Strategy Risk
pg 47-73: Deep dive into the Sharpe Ratio

Backtest Statistics

Annualized Sharpe Ratio

Implementation

Example

Haircut Sharpe Ratio

Implementation

Example

Probabilistic Sharpe Ratio

Implementation

Example

Deflated Sharpe Ratio

Implementation

Example

Information Ratio

Implementation

Example

Profit Hurdle

Implementation

Example

Minimum Track Record Length

Implementation

Example

Bets Concentration

Implementation

Example

All Bets Concentration

Implementation

Example

Drawdown and Time Under Water

Implementation

Example

Average Holding Period

Implementation

Example

Flattening and Flips

Implementation

Example

Research Notebooks

Presentation Slides

References