Second Generation Models

Second-generation microstructural models (strategic trade models) focus on understanding and quantifying illiquidity. Due to the nature of illiquidity, there is a risk premium associated with it which makes it a useful feature in financial machine learning models.

Second generation models explain trading as a strategic interaction between informed and uninformed traders leading to a stronger theoretical framework than first generation models. These models emphasize the importance of signed volume and order flow imbalance. Most of the parameters of interest (such as lambda) are estimated by applying regression. Inter-bar microstructural features can be obtained when bars are created such as time, volume, imbalance and run bars. The three second generation models described in this section are:

  • Kyle’s Lambda

  • Amihud’s Lambda

  • Hasbrouck’s Lambda

Kyle's Lambda

Closing prices in blue, and Kyle’s Lambda in red

Note

Underlying Literature

The following sources elaborate extensively on the topic:

  • Advances in Financial Machine Learning, Chapter 19, Section 4 by Marcos Lopez de Prado. Describes the emergence and modern day uses of the second generation of microstructural features in more detail


Kyle’s Lambda

The following description is based on Section 19.4.1 of Advances in Financial Machine Learning:

Kyle (1985) introduced the following strategic trade model. Consider a risky asset with terminal value \(v \sim N\left[p_{0}, \Sigma_{0}\right]\), as well as two traders:

  • A noise trader who trades a quantity \(u=N\left[0, \sigma_{u}^{2}\right]\), independent of \(v\).

  • An informed trader who knows \(v\) and demands a quantity \(x\), through a market order.

The market maker observes the total order flow \(y=x+u\), and sets a price \(p\) accordingly. In this model, market makers cannot distinguish between orders from noise traders and informed traders. They adjust prices as a function of the order flow imbalance, as that may indicate the presence of an informed trader. Hence, there is a positive relationship between price change and order flow imbalance, which is called market impact.

The informed trader conjectures that the market maker has a linear price adjustment function, \(p=\lambda y+\mu\), where \(\lambda\) is an inverse measure of liquidity. The informed trader’s profits are \(\pi=(v-p) x\), which are maximized at \(x=\frac{v-\mu}{2 \lambda}\), with second order condition \(\lambda>0\).

Conversely, the market maker conjectures that the informed trader’s demand is a linear function of \(v: x=\alpha+\beta v\), which implies \(\alpha=-\frac{\mu}{2 \lambda}\) and \(\beta=\frac{1}{2 \lambda}\). Note that lower liquidity means higher \(\lambda\), which means lower demand from the informed trader. Kyle argues that the market maker must find an equilibrium between profit maximization and market efficiency, and that under the above linear functions, the only possible solution occurs when:

\[\begin{split}\mu=p_{0} \\ \alpha=p_{0} \sqrt{\frac{\sigma_{u}^{2}}{\Sigma_{0}}} \\ \lambda=\frac{1}{2} \sqrt{\frac{\Sigma_{0}}{\sigma_{u}^{2}}} \\ \beta=\sqrt{\frac{\sigma_{u}^{2}}{\Sigma_{0}}}\end{split}\]

Finally, the informed trader’s expected profit can be rewritten as

\[\mathrm{E}[\pi]=\frac{\left(v-p_{0}\right)^{2}}{2} \sqrt{\frac{\sigma_{u}^{2}}{\Sigma_{0}}}=\frac{1}{4 \lambda}\left(v-p_{0}\right)^{2}\]

The implication is that the informed trader has three sources of profit:

  • The security’s mispricing.

  • The variance of the noise trader’s net order flow. The higher the noise, the easier the informed trader can conceal his intentions.

  • The reciprocal of the terminal security’s variance. The lower the volatility, the easier to monetize the mispricing.

In Kyle’s model, the variable \(\lambda\) captures price impact. Illiquidity increases with uncertainty about \(v\) and decreases with the amount of noise. As a feature, it can be estimated by fitting the regression

\[\Delta p_{t}=\lambda\left(b_{t} V_{t}\right)+\varepsilon_{t}\]

where \(\left\{p_{t}\right\}\) is the time series of prices, \(\left\{b_{t}\right\}\) is the time series of aggressor flags, \(\left\{V_{t}\right\}\) is the time series of traded volumes, and hence \(\left\{b_{t} V_{t}\right\}\) is the time series of signed volume or net order flow.”

Implementation

get_bar_based_kyle_lambda(close: Series, volume: Series, aggressor_flags: Series | None = None, window: int = 20) Series

Advances in Financial Machine Learning, p. 286-288.

Get Kyle lambda from bars data.

Parameters:
  • close – (pd.Series) Close prices.

  • volume – (pd.Series) Bar volume.

  • aggressor_flags – (pd.Series) Series of indicators {-1, 1} if a bar was buy(1) or sell (-1). If None, sign of price differences is used.

  • window – (int) Rolling window used for estimation.

Returns:

(pd.Series) Kyle lambdas & t-statistic.

Example

>>> import numpy as np
>>> import pandas as pd
>>> from mlfinlab.microstructural_features.second_generation import get_bar_based_kyle_lambda
>>> # Load data
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/dollar_bars.csv"
>>> data = pd.read_csv(url, index_col='date_time', parse_dates=[0])
>>> kyle_lambda_signed = get_bar_based_kyle_lambda(data.close, data.cum_vol,
...                                                aggressor_flags=np.sign(data.close.diff()),
...                                                window=20)
>>> kyle_lambda_signed  
kyle_lambda...

Amihud’s Lambda

The following description is based on Section 19.4.2 of Advances in Financial Machine Learning:

Amihud (2002) studies the positive relationship between absolute returns and illiquidity. In particular, he computes the daily price response associated with one dollar of trading volume, and argues its value is a proxy of price impact. One possible implementation of this idea is:

\[\left|\Delta \log \left[\tilde{p}_{\tau}\right]\right|=\lambda \sum_{t \in B_{\tau}}\left(p_{t} V_{t}\right)+\varepsilon_{\tau}\]

where \(B_{\tau}\) is the set of trades included in bar \(\tau, \tilde{p}_{\tau}\) is the closing price of bar \(\tau\), and \(p_{t} V_{t}\) is the dollar volume involved in trade \(t \in B_{\tau}\). Despite its apparent simplicity, Hasbrouck (2009) found that daily Amihud’s lambda estimates exhibit a high rank correlation to intraday estimates of effective spread.”

Implementation

get_bar_based_amihud_lambda(close: Series, dollar_volume: Series, window: int = 20) Series

Advances in Financial Machine Learning, p.288-289.

Get Amihud lambda from bars data.

Parameters:
  • close – (pd.Series) Close prices.

  • dollar_volume – (pd.Series) Dollar volumes.

  • window – (int) rolling window used for estimation.

Returns:

(pd.Series) of Amihud lambda & t-statistic.

Example

>>> import pandas as pd
>>> from mlfinlab.microstructural_features.second_generation import (
...     get_bar_based_amihud_lambda,
... )
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/dollar_bars.csv"
>>> data = pd.read_csv(url, index_col="date_time", parse_dates=[0])
>>> amihud_lambda = get_bar_based_amihud_lambda(data.close, data.cum_dollar, window=20)
>>> amihud_lambda  
amihud_lambda...

Hasbrouck’s Lambda

The following description is based on Section 19.4.3 of Advances in Financial Machine Learning:

Hasbrouck (2009) follows up on Kyle’s and Amihud’s ideas, and applies them to estimating the price impact coefficient based on trade-and-quote (TAQ) data. He uses a Gibbs sampler to produce a Bayesian estimation of the regression specification

\[\log \left[\tilde{p}_{i, \tau}\right]-\log \left[\tilde{p}_{i, \tau-1}\right]=\lambda_{i} \sum_{t \in B_{i, \tau}}\left(b_{i, t} \sqrt{p_{i, t} V_{i, t}}\right)+\varepsilon_{i, \tau}\]

where \(B_{i, \tau}\) is the set of trades included in bar \(\tau\) for security \(i\), with \(i=1, \ldots, I, \tilde{p}_{i, \tau}\) is the closing price of bar \(\tau\) for security \(i, b_{i, t} \in\{-1,1\}\) indicates whether trade \(t \in B_{i, \tau}\) was buy-initiated or sell-initiated; and \(p_{i, t} V_{i, t}\) is the dollar volume involved in trade \(t \in B_{i, \tau}\). We can then estimate \(\lambda_{i}\)for every security \({i}\), and use it as a feature that approximates the effective cost of trading (market impact).”

Implementation

get_bar_based_hasbrouck_lambda(close: Series, dollar_volume: Series, aggressor_flags: Series | None = None, window: int = 20) Series

Advances in Financial Machine Learning, p.289-290.

Get Hasbrouck lambda from bars data.

Parameters:
  • close – (pd.Series) Close prices.

  • dollar_volume – (pd.Series) Dollar volumes.

  • aggressor_flags – (pd.Series) Series of indicators {-1, 1} if a bar was buy(1) or sell (-1). If None, sign of price differences is used.

  • window – (int) Rolling window used for estimation.

Returns:

(pd.Series) Hasbrouck lambdas series & t-statistic.

Example

>>> import pandas as pd
>>> from mlfinlab.microstructural_features.second_generation import get_bar_based_hasbrouck_lambda
>>> url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/dollar_bars.csv"
>>> data = pd.read_csv(url, index_col='date_time', parse_dates=[0])
>>> hasbrouck_lambda = get_bar_based_hasbrouck_lambda(data.close, data.cum_dollar,
...                                                   aggressor_flags=np.sign(data.close.diff()),
...                                                   window=20)
>>> hasbrouck_lambda  
hasbrouck_lambda...

Research Notebook

The following research notebooks can be used to better understand labeling excess over mean.


Presentation Slides

lecture8.png

Note

  • pg 1-14: Structural Breaks

  • pg 15-24: Entropy Features

  • pg 25-37: Microstructural Features

micro_slides.png

References