Second Generation Models
Second-generation microstructural models (strategic trade models) focus on understanding and quantifying illiquidity. Due to the nature of illiquidity, there is a risk premium associated with it which makes it a useful feature in financial machine learning models.
Second generation models explain trading as a strategic interaction between informed and uninformed traders leading to a stronger theoretical framework than first generation models. These models emphasize the importance of signed volume and order flow imbalance. Most of the parameters of interest (such as lambda) are estimated by applying regression. Inter-bar microstructural features can be obtained when bars are created such as time, volume, imbalance and run bars. The three second generation models described in this section are:
-
Kyle’s Lambda
-
Amihud’s Lambda
-
Hasbrouck’s Lambda
Note
Underlying Literature
The following sources elaborate extensively on the topic:
-
Advances in Financial Machine Learning, Chapter 19, Section 4 by Marcos Lopez de Prado. Describes the emergence and modern day uses of the second generation of microstructural features in more detail
Kyle’s Lambda
The following description is based on Section 19.4.1 of Advances in Financial Machine Learning:
” Kyle (1985) introduced the following strategic trade model. Consider a risky asset with terminal value \(v \sim N\left[p_{0}, \Sigma_{0}\right]\), as well as two traders:
-
A noise trader who trades a quantity \(u=N\left[0, \sigma_{u}^{2}\right]\), independent of \(v\).
-
An informed trader who knows \(v\) and demands a quantity \(x\), through a market order.
The market maker observes the total order flow \(y=x+u\), and sets a price \(p\) accordingly. In this model, market makers cannot distinguish between orders from noise traders and informed traders. They adjust prices as a function of the order flow imbalance, as that may indicate the presence of an informed trader. Hence, there is a positive relationship between price change and order flow imbalance, which is called market impact.
The informed trader conjectures that the market maker has a linear price adjustment function, \(p=\lambda y+\mu\), where \(\lambda\) is an inverse measure of liquidity. The informed trader’s profits are \(\pi=(v-p) x\), which are maximized at \(x=\frac{v-\mu}{2 \lambda}\), with second order condition \(\lambda>0\).
Conversely, the market maker conjectures that the informed trader’s demand is a linear function of \(v: x=\alpha+\beta v\), which implies \(\alpha=-\frac{\mu}{2 \lambda}\) and \(\beta=\frac{1}{2 \lambda}\). Note that lower liquidity means higher \(\lambda\), which means lower demand from the informed trader. Kyle argues that the market maker must find an equilibrium between profit maximization and market efficiency, and that under the above linear functions, the only possible solution occurs when:
Finally, the informed trader’s expected profit can be rewritten as
The implication is that the informed trader has three sources of profit:
The security’s mispricing.
-
The variance of the noise trader’s net order flow. The higher the noise, the easier the informed trader can conceal his intentions.
-
The reciprocal of the terminal security’s variance. The lower the volatility, the easier to monetize the mispricing.
In Kyle’s model, the variable \(\lambda\) captures price impact. Illiquidity increases with uncertainty about \(v\) and decreases with the amount of noise. As a feature, it can be estimated by fitting the regression
where \(\left\{p_{t}\right\}\) is the time series of prices, \(\left\{b_{t}\right\}\) is the time series of aggressor flags, \(\left\{V_{t}\right\}\) is the time series of traded volumes, and hence \(\left\{b_{t} V_{t}\right\}\) is the time series of signed volume or net order flow.”
Implementation
- get_bar_based_kyle_lambda(close: Series, volume: Series, aggressor_flags: Series | None = None, window: int = 20) Series
-
Advances in Financial Machine Learning, p. 286-288.
Get Kyle lambda from bars data.
- Parameters:
-
-
close – (pd.Series) Close prices.
-
volume – (pd.Series) Bar volume.
-
aggressor_flags – (pd.Series) Series of indicators {-1, 1} if a bar was buy(1) or sell (-1). If None, sign of price differences is used.
-
window – (int) Rolling window used for estimation.
-
- Returns:
-
(pd.Series) Kyle lambdas & t-statistic.
Example
>>> import numpy as np
>>> import pandas as pd
>>> from mlfinlab.microstructural_features.second_generation import get_bar_based_kyle_lambda
>>> # Load data
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/dollar_bars.csv"
>>> data = pd.read_csv(url, index_col='date_time', parse_dates=[0])
>>> kyle_lambda_signed = get_bar_based_kyle_lambda(data.close, data.cum_vol,
... aggressor_flags=np.sign(data.close.diff()),
... window=20)
>>> kyle_lambda_signed
kyle_lambda...
Amihud’s Lambda
The following description is based on Section 19.4.2 of Advances in Financial Machine Learning:
” Amihud (2002) studies the positive relationship between absolute returns and illiquidity. In particular, he computes the daily price response associated with one dollar of trading volume, and argues its value is a proxy of price impact. One possible implementation of this idea is:
where \(B_{\tau}\) is the set of trades included in bar \(\tau, \tilde{p}_{\tau}\) is the closing price of bar \(\tau\), and \(p_{t} V_{t}\) is the dollar volume involved in trade \(t \in B_{\tau}\). Despite its apparent simplicity, Hasbrouck (2009) found that daily Amihud’s lambda estimates exhibit a high rank correlation to intraday estimates of effective spread.”
Implementation
- get_bar_based_amihud_lambda(close: Series, dollar_volume: Series, window: int = 20) Series
-
Advances in Financial Machine Learning, p.288-289.
Get Amihud lambda from bars data.
- Parameters:
-
-
close – (pd.Series) Close prices.
-
dollar_volume – (pd.Series) Dollar volumes.
-
window – (int) rolling window used for estimation.
-
- Returns:
-
(pd.Series) of Amihud lambda & t-statistic.
Example
>>> import pandas as pd
>>> from mlfinlab.microstructural_features.second_generation import (
... get_bar_based_amihud_lambda,
... )
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/dollar_bars.csv"
>>> data = pd.read_csv(url, index_col="date_time", parse_dates=[0])
>>> amihud_lambda = get_bar_based_amihud_lambda(data.close, data.cum_dollar, window=20)
>>> amihud_lambda
amihud_lambda...
Hasbrouck’s Lambda
The following description is based on Section 19.4.3 of Advances in Financial Machine Learning:
” Hasbrouck (2009) follows up on Kyle’s and Amihud’s ideas, and applies them to estimating the price impact coefficient based on trade-and-quote (TAQ) data. He uses a Gibbs sampler to produce a Bayesian estimation of the regression specification
where \(B_{i, \tau}\) is the set of trades included in bar \(\tau\) for security \(i\), with \(i=1, \ldots, I, \tilde{p}_{i, \tau}\) is the closing price of bar \(\tau\) for security \(i, b_{i, t} \in\{-1,1\}\) indicates whether trade \(t \in B_{i, \tau}\) was buy-initiated or sell-initiated; and \(p_{i, t} V_{i, t}\) is the dollar volume involved in trade \(t \in B_{i, \tau}\). We can then estimate \(\lambda_{i}\)for every security \({i}\), and use it as a feature that approximates the effective cost of trading (market impact).”
Implementation
- get_bar_based_hasbrouck_lambda(close: Series, dollar_volume: Series, aggressor_flags: Series | None = None, window: int = 20) Series
-
Advances in Financial Machine Learning, p.289-290.
Get Hasbrouck lambda from bars data.
- Parameters:
-
-
close – (pd.Series) Close prices.
-
dollar_volume – (pd.Series) Dollar volumes.
-
aggressor_flags – (pd.Series) Series of indicators {-1, 1} if a bar was buy(1) or sell (-1). If None, sign of price differences is used.
-
window – (int) Rolling window used for estimation.
-
- Returns:
-
(pd.Series) Hasbrouck lambdas series & t-statistic.
Example
>>> import pandas as pd
>>> from mlfinlab.microstructural_features.second_generation import get_bar_based_hasbrouck_lambda
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/dollar_bars.csv"
>>> data = pd.read_csv(url, index_col='date_time', parse_dates=[0])
>>> hasbrouck_lambda = get_bar_based_hasbrouck_lambda(data.close, data.cum_dollar,
... aggressor_flags=np.sign(data.close.diff()),
... window=20)
>>> hasbrouck_lambda
hasbrouck_lambda...
Research Notebook
The following research notebooks can be used to better understand labeling excess over mean.
Presentation Slides
Note
pg 1-14: Structural Breaks
pg 15-24: Entropy Features
pg 25-37: Microstructural Features