Exact Fit using first 3 Moments (EF3M)
The EF3M algorithm was introduced in a paper by Marcos Lopez de Prado and Matthew D. Foreman, titled “A mixture of Gaussians approach to mathematical portfolio oversight: the EF3M algorithm”.
The abstract reads: “An analogue can be made between: (a) the slow pace at which species adapt to an environment, which often results in the emergence of a new distinct species out of a once homogeneous genetic pool, and (b) the slow changes that take place over time within a fund, mutating its investment style. A fund’s track record provides a sort of genetic marker, which we can use to identify mutations. This has motivated our use of a biometric procedure to detect the emergence of a new investment style within a fund’s track record. In doing so, we answer the question: “What is the probability that a particular PM’s performance is departing from the reference distribution used to allocate her capital?” The EF3M approach, inspired by evolutionary biology, may help detect early stages of an evolutionary divergence in an investment style, and trigger a decision to review a fund’s capital allocation.”
The Exact Fit of the first 3 Moments (EF3M) algorithm allows the parameters of a mixture of Gaussian distributions to be estimated given the first 5 moments of the mixture distribution, as well as the assumption that the mixture distribution is composed of a number of Gaussian distributions.
A more thorough investigation into the algorithm can be found in one of our Research Notebooks.
Note
Underlying Literature
The following sources describe this method in more detail:
-
A mixture of Gaussians approach to mathematical portfolio oversight: the EF3M algorithm by Marcos Lopez de Prado and Matthew D. Foreman.
M2N Implementation
A class for determining the means, standard deviations, and mixture proportion of a given distribution from it’s first four or five statistical moments.
- class M2N(moments, epsilon=1e-05, factor=5, n_runs=1, variant=1, max_iter=100000, num_workers=-1)
-
M2N - A Mixture of 2 Normal distributions This class is used to contain parameters and equations for the EF3M algorithm, when fitting parameters to a mixture of 2 Gaussian distributions.
- Parameters:
-
-
moments – (list) The first five (1… 5) raw moments of the mixture distribution.
-
epsilon – (float) Fitting tolerance.
-
factor – (float) Lambda factor from equations.
-
n_runs – (int) Number of times to execute ‘singleLoop’.
-
variant – (int) The EF3M variant to execute, options are 1: EF3M using first 4 moments, 2: EF3M using first 5 moments.
-
max_iter – (int) Maximum number of iterations to perform in the ‘fit’ method.
-
num_workers – (int) Number of CPU cores to use for multiprocessing execution. Default is -1 which sets num_workers to all cores.
-
- fit(mu_2)
-
Fits and the parameters that describe the mixture of the 2 Normal distributions for a given set of initial parameter guesses.
- Parameters:
-
mu_2 – (float) An initial estimate for the mean of the second distribution.
- get_moments(parameters, return_result=False)
-
Calculates and returns the first five (1…5) raw moments corresponding to the newly estimated parameters.
- Parameters:
-
-
parameters – (list) List of parameters if the specific order [mu_1, mu_2, sigma_1, sigma_2, p_1].
-
return_result – (bool) If True, method returns a result instead of setting the ‘self.new_moments’ attribute.
-
- Returns:
-
(list) List of the first five moments.
- iter_4(mu_2, p_1)
-
Evaluation of the set of equations that make up variant #1 of the EF3M algorithm (fitting using the first four moments).
- Parameters:
-
-
mu_2 – (float) Initial parameter value for mu_2.
-
p_1 – (float) Probability defining the mixture; p_1, 1 - p_1.
-
- Returns:
-
(list) List of estimated parameter if no invalid values are encountered (e.g. complex values, divide-by-zero), otherwise an empty list is returned.
- iter_5(mu_2, p_1)
-
Evaluation of the set of equations that make up variant #2 of the EF3M algorithm (fitting using the first five moments).
- Parameters:
-
-
mu_2 – (float) Initial parameter value for mu_2.
-
p_1 – (float) Probability defining the mixture; p_1, 1-p_1.
-
- Returns:
-
(list) List of estimated parameter if no invalid values are encountered (e.g. complex values, divide-by-zero), otherwise an empty list is returned.
- mp_fit()
-
Parallelized implementation of the ‘single_fit_loop’ method. Makes use of multiprocessing to execute multiple calls of ‘single_fit_loop’ in parallel.
- Note: Currently this is not implemented as a parralel
-
implementation, since benchmarked performance is identical to a single-thread.
- Returns:
-
(pd.DataFrame) Fitted parameters and error.
- single_fit_loop(epsilon=0.0)
-
A single scan through the list of mu_2 values, cataloging the successful fittings in a DataFrame.
- Parameters:
-
epsilon – (float) Fitting tolerance.
- Returns:
-
(pd.DataFrame) Fitted parameters and error.
Utility Functions For Fitting Of Distribution Mixtures
- centered_moment(moments, order)
-
Compute a single moment of a specific order about the mean (centered) given moments about the origin (raw).
- Parameters:
-
-
moments – (list) First ‘order’ raw moments.
-
order – (int) The order of the moment to calculate.
-
- Returns:
-
(float) The central moment of specified order.
- raw_moment(central_moments, dist_mean)
-
Calculates a list of raw moments given a list of central moments.
- Parameters:
-
-
central_moments – (list) The first n (1…n) central moments as a list.
-
dist_mean – (float) The mean of the distribution.
-
- Returns:
-
(list) The first n+1 (0…n) raw moments.
- most_likely_parameters(data, ignore_columns='error', res=10000)
-
Determines the most likely parameter estimate using a KDE from the DataFrame of the results of the fit from the M2N object.
- Parameters:
-
-
data – (pd.DataFrame) Contains parameter estimates from all runs.
-
ignore_columns – (str/list) Column or columns to exclude from analysis.
-
res – (int) Resolution of the kernel density estimate.
-
- Returns:
-
(dict) Labels and most likely estimates for parameters.
Research Notebook
The following research notebooks can be used to better understand bet sizing.
EF3M Algorithm
Chapter 10 Exercise Notebook