Bet Sizing in ML

“There are fascinating parallels between strategy games and investing. Some of the best portfolio managers I have worked with are excellent poker players, perhaps more so than chess players. One reason is bet sizing, for which Texas Hold’em provides a great analogue and training ground. Your ML algorithm can achieve high accuracy, but if you do not size your bets properly, your investment strategy will inevitably lose money. In this chapter we will review a few approaches to size bets from ML predictions.” Advances in Financial Machine Learning, Chapter 10: Bet Sizing, pg 141.

The code in this directory falls under 3 submodules:

Bet Sizing: We have extended the code from the book in an easy to use format for practitioners to use going forward.
EF3M: An implementation of the EF3M algorithm.
Chapter10_Snippets: Documented and adjusted snippets from the book for users to experiment with.

Note

Underlying Literature

The following sources describe this method in more detail:

Advances in Financial Machine Learning, Chapter 10 by Marcos Lopez de Prado.

Bet Sizing Methods

Functions for bet sizing are implemented based on the approaches described in chapter 10.

Bet Sizing From Predicted Probability

Assuming a machine learning algorithm has predicted a series of investment positions, one can use the probabilities of each of these predictions to derive the size of that specific bet.

bet_size_probability(events, prob, num_classes, pred=None, step_size=0.0, average_active=False, num_threads=1)

Calculates the bet size using the predicted probability. Note that if ‘average_active’ is True, the returned pandas.Series will be twice the length of the original since the average is calculated at each bet’s open and close.

Parameters:

events – (pd.DataFrame) Contains at least the column ‘t1’, the expiry datetime of the product, with a datetime index, the datetime the position was taken.
prob – (pd.Series) The predicted probability.
num_classes – (int) The number of predicted bet sides.
pred – (pd.Series) The predicted bet side. Default value is None which will return a relative bet size (i.e. without multiplying by the side).
step_size – (float) The step size at which the bet size is discretized, default is 0.0 which imposes no discretization.
average_active – (bool) Option to average the size of active bets, default value is False.
num_threads – (int) The number of processing threads to utilize for multiprocessing, default value is 1.

Returns:

(pd.Series) The bet size, with the time index.

Dynamic Bet Sizes

Assuming one has a series of forecasted prices for a given investment product, that forecast and the current market price and position can be used to dynamically calculate the bet size.

bet_size_dynamic(current_pos, max_pos, market_price, forecast_price, cal_divergence=10, cal_bet_size=0.95, func='sigmoid')

Calculates the bet sizes, target position, and limit price as the market price and forecast price fluctuate. The current position, maximum position, market price, and forecast price can be passed as separate pandas.Series (with a common index), as individual numbers, or a combination thereof. If any one of the aforementioned arguments is a pandas.Series, the other arguments will be broadcast to a pandas.Series of the same length and index.

Parameters:

current_pos – (pd.Series) Current position as int.
max_pos – (pd.Series) Maximum position as int.
market_price – (pd.Series) Market price.
forecast_price – (pd.Series) Forecast price.
cal_divergence – (float) The divergence to use in calibration.
cal_bet_size – (float) The bet size to use in calibration.
func – (str) Function to use for dynamic calculation. Valid options are: ‘sigmoid’, ‘power’.

Returns:

(pd.DataFrame) Bet size (bet_size), target position (t_pos), and limit price (l_p).

Strategy-Independent Bet Sizing Approaches

These approaches consider the number of concurrent active bets and their sides, and sets the bet size is such a way that reserves some cash for the possibility that the trading signal strengthens before it weakens.

bet_size_budget(events_t1, sides)

Calculates a bet size from the bet sides and start and end times. These sequences are used to determine the number of concurrent long and short bets, and the resulting strategy-independent bet sizes are the difference between the average long and short bets at any given time. This strategy is based on the section 10.2 in “Advances in Financial Machine Learning”. This creates a linear bet sizing scheme that is aligned to the expected number of concurrent bets in the dataset.

Parameters:

events_t1 – (pd.Series) The end datetime of the position with the start datetime as the index.
sides – (pd.Series) The side of the bet with the start datetime as index. Index must match the ‘events_t1’ argument exactly. Bet sides less than zero are interpretted as short, bet sides greater than zero are interpretted as long.

Returns:

(pd.DataFrame) The ‘events_t1’ and ‘sides’ arguments as columns, with the number of concurrent active long and short bets, as well as the bet size, in additional columns.

bet_size_reserve(events_t1, sides, fit_runs=100, epsilon=1e-05, factor=5, variant=2, max_iter=10000, num_workers=1, return_parameters=False)

Calculates the bet size from bet sides and start and end times. These sequences are used to determine the number of concurrent long and short bets, and the difference between the two at each time step, c_t. A mixture of two Gaussian distributions is fit to the distribution of c_t, which is then used to determine the bet size. This strategy results in a sigmoid-shaped bet sizing response aligned to the expected number of concurrent long and short bets in the dataset.

Note that this function creates a <mlfinlab.bet_sizing.ef3m.M2N> object and makes use of the parallel fitting functionality. As such, this function accepts and passes fitting parameters to the mlfinlab.bet_sizing.ef3m.M2N.mp_fit method.

Parameters:

events_t1 – (pd.Series) The end datetime of the position with the start datetime as the index.
sides – (pd.Series) The side of the bet with the start datetime as index. Index must match the ‘events_t1’ argument exactly. Bet sides less than zero are interpretted as short, bet sides greater than zero are interpretted as long.
fit_runs – (int) Number of runs to execute when trying to fit the distribution.
epsilon – (float) Error tolerance.
factor – (float) Lambda factor from equations.
variant – (int) Which algorithm variant to use, 1 or 2.
max_iter – (int) Maximum number of iterations after which to terminate loop.
num_workers – (int) Number of CPU cores to use for multiprocessing execution, set to -1 to use all CPU cores. Default is 1. Note: This currently has no effect.
return_parameters – (bool) If True, function also returns a dictionary of the fitted mixture parameters.

Returns:

(pd.DataFrame) The ‘events_t1’ and ‘sides’ arguments as columns, with the number of concurrent active long, short bets, the difference between long and short, and the bet size in additional columns. Also returns the mixture parameters if ‘return_parameters’ is set to True.

Additional Utility Functions For Bet Sizing

confirm_and_cast_to_df(d_vars)

Accepts either pandas.Series (with a common index) or integer/float values, casts all non-pandas.Series values to Series, and returns a pandas.DataFrame for further calculations. This is a helper function to the ‘bet_size_dynamic’ function.

Parameters:: d_vars – (dict) A dictionary where the values are either pandas.Series or single int/float values. All pandas.Series passed are assumed to have the same index. The keys of the dictionary will be used for column names in the returned pandas.DataFrame.
Returns:: (pd.DataFrame) The values from the input dictionary in pandas.DataFrame format, with dictionary keys as column names.

get_concurrent_sides(events_t1, sides)

Given the side of the position along with its start and end timestamps, this function returns two pandas.Series indicating the number of concurrent long and short bets at each timestamp.

Parameters:

events_t1 – (pd.Series) The end datetime of the position with the start datetime as the index.
sides – (pd.Series) The side of the bet with the start datetime as index. Index must match the ‘events_t1’ argument exactly. Bet sides less than zero are interpreted as short, bet sides greater than zero are interpreted as long.

Returns:

(pd.DataFrame) The ‘events_t1’ and ‘sides’ arguments as columns, with two additional columns indicating the number of concurrent active long and active short bets at each timestamp.

cdf_mixture(x_val, parameters)

The cumulative distribution function of a mixture of 2 normal distributions, evaluated at x_val.

Parameters:

x_val – (float) Value at which to evaluate the CDF.
parameters – (list) The parameters of the mixture, [mu_1, mu_2, sigma_1, sigma_2, p_1].

Returns:

(float) CDF of the mixture.

single_bet_size_mixed(c_t, parameters)

Returns the single bet size based on the description provided in question 10.4(c), provided the difference in concurrent long and short positions, c_t, and the fitted parameters of the mixture of two Gaussian distributions.

Parameters:

c_t – (int) The difference in the number of concurrent long bets minus short bets.
parameters – (list) The parameters of the mixture, [mu_1, mu_2, sigma_1, sigma_2, p_1].

Returns:

(float) Bet size.

Chapter 10 Code Snippets

Chapter 10 of Advances in Financial Machine Learning contains a number of Python code snippets, many of which are used to create the top level bet sizing functions. These functions can be found in mlfinlab.bet_sizing.ch10_snippets.py.

Snippets For Bet Sizing From Probabilities

get_signal(prob, num_classes, pred=None)

SNIPPET 10.1 - FROM PROBABILITIES TO BET SIZE Calculates the given size of the bet given the side and the probability (i.e. confidence) of the prediction. In this representation, the probability will always be between 1/num_classes and 1.0.

Parameters:

prob – (pd.Series) The probability of the predicted bet side.
num_classes – (int) The number of predicted bet sides.
pred – (pd.Series) The predicted bet side. Default value is None which will return a relative bet size (i.e. without multiplying by the side).

Returns:

(pd.Series) The bet size.

avg_active_signals(signals, num_threads=1)

SNIPPET 10.2 - BETS ARE AVERAGED AS LONG AS THEY ARE STILL ACTIVE Function averages the bet sizes of all concurrently active bets. This function makes use of multiprocessing.

Parameters:

signals – (pd.DataFrame) Contains at least the following columns: ‘signal’ - the bet size ‘t1’ - the closing time of the bet And the index must be datetime format.
num_threads – (int) Number of threads to use in multiprocessing, default value is 1.

Returns:

(pd.Series) The averaged bet sizes.

mp_avg_active_signals(signals, molecule)

Part of SNIPPET 10.2 A function to be passed to the ‘mp_pandas_obj’ function to allow the bet sizes to be averaged using multiprocessing.

At time loc, average signal among those still active. Signal is active if (a) it is issued before or at loc, and (b) loc is before the signal’s end time, or end time is still unknown (NaT).

Parameters:

signals – (pd.DataFrame) Contains at least the following columns: ‘signal’ (the bet size) and ‘t1’ (the closing time of the bet).
molecule – (list) Indivisible tasks to be passed to ‘mp_pandas_obj’, in this case a list of datetimes.

Returns:

(pd.Series) The averaged bet size sub-series.

discrete_signal(signal0, step_size)

SNIPPET 10.3 - SIZE DISCRETIZATION TO PREVENT OVERTRADING Discretizes the bet size signal based on the step size given.

Parameters:

signal0 – (pd.Series) The signal to discretize.
step_size – (float) Step size.

Returns:

(pd.Series) The discretized signal.

Snippets for Dynamic Bet Sizing

bet_size_sigmoid(w_param, price_div)

Part of SNIPPET 10.4 Calculates the bet size from the price divergence and a regulating coefficient. Based on a sigmoid function for a bet size algorithm.

Parameters:

w_param – (float) Coefficient regulating the width of the bet size function.
price_div – (float) Price divergence, forecast price - market price.

Returns:

(float) The bet size.

get_target_pos_sigmoid(w_param, forecast_price, market_price, max_pos)

Part of SNIPPET 10.4 Calculates the target position given the forecast price, market price, maximum position size, and a regulating coefficient. Based on a sigmoid function for a bet size algorithm.

Parameters:

w_param – (float) Coefficient regulating the width of the bet size function.
forecast_price – (float) Forecast price.
market_price – (float) Market price.
max_pos – (int) Maximum absolute position size.

Returns:

(int) Target position.

inv_price_sigmoid(forecast_price, w_param, m_bet_size)

Part of SNIPPET 10.4 Calculates the inverse of the bet size with respect to the market price. Based on a sigmoid function for a bet size algorithm.

Parameters:

forecast_price – (float) Forecast price.
w_param – (float) Coefficient regulating the width of the bet size function.
m_bet_size – (float) Bet size.

Returns:

(float) Inverse of bet size with respect to market price.

limit_price_sigmoid(target_pos, pos, forecast_price, w_param, max_pos)

Part of SNIPPET 10.4 Calculates the limit price. Based on a sigmoid function for a bet size algorithm.

Parameters:

target_pos – (int) Target position.
pos – (int) Current position.
forecast_price – (float) Forecast price.
w_param – (float) Coefficient regulating the width of the bet size function.
max_pos – (int) Maximum absolute position size.

Returns:

(float) Limit price.

get_w_sigmoid(price_div, m_bet_size)

Part of SNIPPET 10.4 Calculates the inverse of the bet size with respect to the regulating coefficient ‘w’. Based on a sigmoid function for a bet size algorithm.

Parameters:

price_div – (float) Price divergence, forecast price - market price.
m_bet_size – (float) Bet size.

Returns:

(float) Inverse of bet size with respect to the regulating coefficient.

bet_size_power(w_param, price_div)

Derived from SNIPPET 10.4 Calculates the bet size from the price divergence and a regulating coefficient. Based on a power function for a bet size algorithm.

Parameters:

w_param – (float) Coefficient regulating the width of the bet size function.
price_div – (float) Price divergence, f - market_price, must be between -1 and 1, inclusive.

Returns:

(float) The bet size.

get_target_pos_power(w_param, forecast_price, market_price, max_pos)

Derived from SNIPPET 10.4 Calculates the target position given the forecast price, market price, maximum position size, and a regulating coefficient. Based on a power function for a bet size algorithm.

Parameters:

w_param – (float) Coefficient regulating the width of the bet size function.
forecast_price – (float) Forecast price.
market_price – (float) Market price.
max_pos – (float) Maximum absolute position size.

Returns:

(float) Target position.

inv_price_power(forecast_price, w_param, m_bet_size)

Derived from SNIPPET 10.4 Calculates the inverse of the bet size with respect to the market price. Based on a power function for a bet size algorithm.

Parameters:

forecast_price – (float) Forecast price.
w_param – (float) Coefficient regulating the width of the bet size function.
m_bet_size – (float) Bet size.

Returns:

(float) Inverse of bet size with respect to market price.

limit_price_power(target_pos, pos, forecast_price, w_param, max_pos)

Derived from SNIPPET 10.4 Calculates the limit price. Based on a power function for a bet size algorithm.

Parameters:

target_pos – (float) Target position.
pos – (float) Current position.
forecast_price – (float) Forecast price.
w_param – (float) Coefficient regulating the width of the bet size function.
max_pos – (float) Maximum absolute position size.

Returns:

(float) Limit price.

get_w_power(price_div, m_bet_size)

Derived from SNIPPET 10.4 Calculates the inverse of the bet size with respect to the regulating coefficient ‘w’. The ‘w’ coefficient must be greater than or equal to zero. Based on a power function for a bet size algorithm.

Parameters:

price_div – (float) Price divergence, forecast price - market price.
m_bet_size – (float) Bet size.

Returns:

(float) Inverse of bet size with respect to the regulating coefficient.

bet_size(w_param, price_div, func)

Derived from SNIPPET 10.4 Calculates the bet size from the price divergence and a regulating coefficient. The ‘func’ argument allows the user to choose between bet sizing functions.

Parameters:

w_param – (float) Coefficient regulating the width of the bet size function.
price_div – (float) Price divergence, f - market_price
func – (string) Function to use for dynamic calculation. Valid options are: ‘sigmoid’, ‘power’.

Returns:

(float) The bet size.

get_target_pos(w_param, forecast_price, market_price, max_pos, func)

Derived from SNIPPET 10.4 Calculates the target position given the forecast price, market price, maximum position size, and a regulating coefficient. The ‘func’ argument allows the user to choose between bet sizing functions.

Parameters:

w_param – (float) Coefficient regulating the width of the bet size function.
forecast_price – (float) Forecast price.
market_price – (float) Market price.
max_pos – (int) Maximum absolute position size.
func – (string) Function to use for dynamic calculation. Valid options are: ‘sigmoid’, ‘power’.

Returns:

(int) Target position.

inv_price(forecast_price, w_param, m_bet_size, func)

Derived from SNIPPET 10.4 Calculates the inverse of the bet size with respect to the market price. The ‘func’ argument allows the user to choose between bet sizing functions.

Parameters:

forecast_price – (float) Forecast price.
w_param – (float) Coefficient regulating the width of the bet size function.
m_bet_size – (float) Bet size.

Returns:

(float) Inverse of bet size with respect to market price.

limit_price(target_pos, pos, forecast_price, w_param, max_pos, func)

Derived from SNIPPET 10.4 Calculates the limit price. The ‘func’ argument allows the user to choose between bet sizing functions.

Parameters:

target_pos – (int) Target position.
pos – (int) Current position.
forecast_price – (float) Forecast price.
w_param – (float) Coefficient regulating the width of the bet size function.
max_pos – (int) Maximum absolute position size.
func – (string) Function to use for dynamic calculation. Valid options are: ‘sigmoid’, ‘power’.

Returns:

(float) Limit price.

get_w(price_div, m_bet_size, func)

Derived from SNIPPET 10.4 Calculates the inverse of the bet size with respect to the regulating coefficient ‘w’. The ‘func’ argument allows the user to choose between bet sizing functions.

Parameters:

price_div – (float) Price divergence, forecast price - market price.
m_bet_size – (float) Bet size.
func – (string) Function to use for dynamic calculation. Valid options are: ‘sigmoid’, ‘power’.

Returns:

(float) Inverse of bet size with respect to the regulating coefficient.

Research Notebook

The following research notebooks can be used to better understand bet sizing.

Chapter 10 Exercise Notebook

EF3M Algorithm Test Cases

Presentation Slides

Note

pg 1-9: Bet Sizing

Bet Sizing in ML

Bet Sizing Methods

Bet Sizing From Predicted Probability

Dynamic Bet Sizes

Strategy-Independent Bet Sizing Approaches

Additional Utility Functions For Bet Sizing

Chapter 10 Code Snippets

Snippets For Bet Sizing From Probabilities

Snippets for Dynamic Bet Sizing

Research Notebook

Presentation Slides

References