Online Data Structures

The classes below use tick data to produce time, volume, dollar, imbalance and run bars in a live streaming way. New information can quickly be added to produce these bars on the go. The same functionality is offered here as explained in the Standard Data Structures, Imbalance Bars and Run Bars section in MlFinLab. In order to process a big dataset - as a csv, parquet, or pandas DataFrame use the standard Data Structures functionality, but use these Online Data Structure classes for generating bars on the go.

For those new to the topic, it is discussed in the graduate level textbook: Advances in Financial Machine Learning, Chapter 2.

Note

Underlying Literature

The following sources elaborate extensively on the topic:

Each generated bar contains the following:

Key

Description

timestamp

the timestamp at which the bar ends

start timestamp

the first tick timestamp of the bar as specified by the threshold

open

the open price of the security in the generated bar

high

the highest price of the security in the generated bar

low

the lowest price of the security in the generated bar

close

the close price of the security in the generated bar

volume

the volume of the security traded in the bar

cum_buy_volume

cumulative buy volume of ticks in the bar

cum_dollar_value

cumulative dollar value of ticks in the bar

tick_rule_buy_volume

amount of buy volume estimated by the Tick Rule

num_ticks

number of ticks in the bar

ticker

chosen ticker for the class


Time Bars

Time bars are obtained by sampling information at fixed time intervals, e.g., once every minute. These are the traditional open, high, low, close bars that traders are used to seeing. If you want to know more about time bars please refer to the Standard Data Structures section in MlFinLab.

Implementation

class TimeBarGenerator(threshold: int = 86400, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Class which implements time bars compression.

__init__(threshold: int = 86400, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None) object

Initialize Time Bar Generator class.

Parameters:
  • threshold – (int) number of seconds in 1 bar.

  • tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field names should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }

  • aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.

  • exchange – (str) exchange name.

  • contract – (str) contract name.

apply_tick_rule(price: float) Tuple[int, float]

Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.

process_tick(tick: dict) bool

Process one tick. :param tick: (dict) tick to process. {‘timestamp’: pd.Timestamp, ‘price’: float, ‘volume’: float, ‘aggressor_side’(optional): int} :return: (bool) Flag indicating that a new bar was formed.

set_threshold(threshold: float)

Set new threshold for bar calculations. :param threshold: (float) threshold to set.

Attributes: TimeBarGenerator.bars contains the information for each bar.

Example

Class TimeBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.

  • timestamp refers to the last time stamp in the bar

  • start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.

>>> import pandas as pd
>>> from mlfinlab.online_data_structures import time_bars
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate time bars according to threshold in seconds
>>> bars_time = time_bars.TimeBarGenerator(
...     threshold=60,
...     tick_fields_mapping={
...         "timestamp": "date",
...         "price": "price",
...         "volume": "volume",
...         "aggressor_side": "aggressor_side",
...         "ticker": "ticker",
...     },
... )
>>> # Get the time bars from the ticks
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
...     agg_side = bars_time.apply_tick_rule(data["price"][i])[0]
...     new_bar = bars_time.process_tick(
...         {
...             "date": pd.to_datetime(data["date"][i]),
...             "price": data["price"][i],
...             "volume": data["volume"][i],
...             "aggressor_side": agg_side,
...             "ticker": "ticker",
...         }
...     )
...     # If new_bar is True then a new bar is generated
...     if new_bar:
...         # We can change the threshold while processing bars
...         # For example when the first bar is formed we change the threshold to two minutes
...         bars_time.set_threshold(120)
...
>>> # Access all the generated bars
>>> generated_bars = bars_time.bars
>>> generated_bars  
[...]

Volume Bars

Volume bars sample information every time a pre-defined amount of the security’s units (shares, futures contracts, etc.) have been exchanged. If you want to know more about time bars please refer to the Standard Data Structures section in MlFinLab.

Implementation

class VolumeBarGenerator(threshold: float, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Class for Volume Bar Generation.

__init__(threshold: float, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Initialize Volume Bar Generator class.

Parameters:
  • threshold – (float) volume amount threshold.

  • tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field names should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }

  • aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.

  • exchange – (str) exchange name.

  • contract – (str) contract name.

apply_tick_rule(price: float) Tuple[int, float]

Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.

process_tick(tick: dict | list) bool

Process one tick or a list of ticks. :return: (bool) Flag indicating that a new bar was formed.

set_threshold(threshold: float)

Set new threshold for bar calculations. :param threshold: (float) threshold to set.

Attributes: VolumeBarGenerator.bars contains the information for each bar.

Example

Class VolumeBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.

  • timestamp refers to the last time stamp in the bar

  • start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.

>>> import pandas as pd
>>> from mlfinlab.online_data_structures import volume_bars
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate volume bars according to volume traded
>>> bars_volume = volume_bars.VolumeBarGenerator(
...     threshold=1000,
...     tick_fields_mapping={
...         "timestamp": "date",
...         "price": "price",
...         "volume": "volume",
...         "aggressor_side": "aggressor_side",
...         "ticker": "ticker",
...     },
... )
>>> # Get the volume bars from the ticks
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
...     agg_side = bars_volume.apply_tick_rule(data["price"][i])[0]
...     new_bar = bars_volume.process_tick(
...         {
...             "date": pd.to_datetime(data["date"][i]),
...             "price": data["price"][i],
...             "volume": data["volume"][i],
...             "aggressor_side": agg_side,
...             "ticker": "ticker",
...         }
...     )
...     # If new_bar is True then a new bar is generated
...     if new_bar:
...         # We can change the the threshold while processing bars
...         # For example when the first bar is formed we change the volume threshold to 2000 (the amount volume traded to form a new bar)
...         bars_volume.set_threshold(2000)
...
>>> # Access all the generated bars
>>> generated_bars = bars_volume.bars
>>> generated_bars  
[...]

Dollar Bars

Dollar bars are formed by sampling an observation every time a pre-defined market value is exchanged. If you want to know more about time bars please refer to the Standard Data Structures section in MlFinLab.

Implementation

class DollarBarGenerator(threshold: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Class which implements dollar bars compression.

__init__(threshold: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Initialize Dollar Bar Generator class.

Parameters:
  • threshold – (int) dollar amount threshold.

  • tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field names should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }

  • aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.

  • exchange – (str) exchange name.

  • contract – (str) contract name.

apply_tick_rule(price: float) Tuple[int, float]

Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.

process_tick(tick: dict | list) bool

Process one tick or a list of ticks. :return: (bool) Flag indicating that a new bar was formed.

set_threshold(threshold: float)

Set new threshold for bar calculations. :param threshold: (float) threshold to set.

Attributes: DollarBarGenerator.bars contains the information for each bar.

Example

Class DollarBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.

  • timestamp refers to the last time stamp in the bar

  • start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.

>>> import pandas as pd
>>> from mlfinlab.online_data_structures import dollar_bars
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate dollar bars according to dollar amount traded
>>> bars_dollar = dollar_bars.DollarBarGenerator(
...     threshold=1_000_000,
...     tick_fields_mapping={
...         "timestamp": "date",
...         "price": "price",
...         "volume": "volume",
...         "aggressor_side": "aggressor_side",
...         "ticker": "ticker",
...     },
... )
>>> # We can change the threshold while processing bars
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
...     agg_side = bars_dollar.apply_tick_rule(data["price"][i])[0]
...     new_bar = bars_dollar.process_tick(
...         {
...             "date": pd.to_datetime(data["date"][i]),
...             "price": data["price"][i],
...             "volume": data["volume"][i],
...             "aggressor_side": agg_side,
...             "ticker": "ticker",
...         }
...     )
...     # If new_bar is True then a new bar is generated
...     if new_bar:
...         # User specified condition to change the threshold
...         # For example when the first bar is formed we change the dollar threshold to 2 000 000
...         bars_dollar.set_threshold(2_000_000)
...
>>> # Access all the generated bars
>>> generated_bars = bars_dollar.bars
>>> generated_bars  
[...]

Imbalance Bars

Imbalance bars form part of information bars where a bar is sampled when new information enters the market. 2 types of imbalance bars are implemented in MlFinLab:

  • Expected number of ticks, defined as EMA (book implementation)

  • Constant number of expected number of ticks.

If you want to know more about time bars please refer to the Imbalance Bars section in MlFinLab.

Implementation

class ImbalanceBarGenerator(imbalance_type: str, expected_imbalance_window: int, exp_num_ticks_init: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Class which implements imbalance bars compression.

__init__(imbalance_type: str, expected_imbalance_window: int, exp_num_ticks_init: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Initialize Imbalance Bar Generator class.

Parameters:
  • imbalance_type – (str) type of the imbalance bar. Possible types : ‘tick_imbalance’,’dollar_imbalance’, ‘volume_imbalance’

  • expected_imbalance_window – (int) Window used to estimate expected imbalance from previous trades

  • exp_num_ticks_init – (int) Initial estimate for expected number of ticks in bar. For Const Imbalance Bars expected number of ticks equals expected number of ticks init.

  • tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field inputs should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }

  • aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.

  • exchange – (str) exchange name.

  • contract – (str) contract name.

apply_tick_rule(price: float) Tuple[int, float]

Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.

process_tick(tick: dict | list) bool

Process one tick or a list of ticks. :return: (bool) Flag indicating that a new bar was formed.

set_threshold(threshold: float)

Set new threshold for bar calculations. :param threshold: (float) threshold to set.

Attributes: ImbalanceBarGenerator.bars contains the information for each bar.

Example

Class ImbalanceBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.

  • timestamp refers to the last time stamp in the bar

  • start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.

>>> import pandas as pd
>>> from mlfinlab.online_data_structures import imbalance_bars
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate imbalance dollar bars according to expected imbalance window
>>> # and expected number of initial ticks per bar
>>> bars_imbalance_bars = imbalance_bars.ImbalanceBarGenerator(
...     imbalance_type="dollar_imbalance",
...     expected_imbalance_window=3,
...     exp_num_ticks_init=30,
...     tick_fields_mapping={
...         "timestamp": "date",
...         "price": "price",
...         "volume": "volume",
...         "aggressor_side": "aggressor_side",
...         "ticker": "ticker",
...     },
... )
>>> # Get the imbalance dollar bars from the ticks
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
...     agg_side = bars_imbalance_bars.apply_tick_rule(data["price"][i])[0]
...     new_bar = bars_imbalance_bars.process_tick(
...         {
...             "date": pd.to_datetime(data["date"][i]),
...             "price": data["price"][i],
...             "volume": data["volume"][i],
...             "aggressor_side": agg_side,
...             "ticker": "ticker",
...         }
...     )
...
>>> # Access all the generated bars
>>> generated_bars = bars_imbalance_bars.bars
>>> generated_bars  
[...]

Run Bars

Run bars also form part of information bars where a bar is sampled when new information enters the market. Run bars share the same mathematical structure as imbalance bars, however, instead of looking at each individual trade, we are looking at sequences of trades in the same direction If you want to know more about time bars please refer to the Run Bars section in MlFinLab.

Implementation

class RunsBarGenerator(runs_type: str, expected_imbalance_window: int, exp_num_ticks_init: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Class which implements runs bars compression.

__init__(runs_type: str, expected_imbalance_window: int, exp_num_ticks_init: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)

Initialize Runs Bar Generator class.

Parameters:
  • runs_type – (str) type of the runs bar. Possible types : ‘tick_runs’,’dollar_runs’, ‘volume_runs’

  • expected_imbalance_window – (int) Window used to estimate expected imbalance from previous trades

  • exp_num_ticks_init – (int) Initial estimate for expected number of ticks in bar. For Const Imbalance Bars expected number of ticks equals expected number of ticks init.

  • tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field inputs should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }

  • aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.

  • exchange – (str) exchange name.

  • contract – (str) contract name.

apply_tick_rule(price: float) Tuple[int, float]

Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.

process_tick(tick: dict | list) bool

Process one tick or a list of ticks. :return: (bool) Flag indicating that a new bar was formed.

set_threshold(threshold: float)

Set new threshold for bar calculations. :param threshold: (float) threshold to set.

Attributes: RunsBarGenerator.bars contains the information for each bar.

Example

Class RunsBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.

  • timestamp refers to the last time stamp in the bar

  • start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.

>>> import pandas as pd
>>> from mlfinlab.online_data_structures import runs_bars
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate run dollar bars according to expected imbalance window
>>> # and expected number of initial ticks per bar
>>> bars_run_dollar = runs_bars.RunsBarGenerator(
...     runs_type="dollar_imbalance",
...     expected_imbalance_window=3,
...     exp_num_ticks_init=30,
...     tick_fields_mapping={
...         "timestamp": "date",
...         "price": "price",
...         "volume": "volume",
...         "aggressor_side": "aggressor_side",
...         "ticker": "ticker",
...     },
... )
>>> # Get the run dollar bars from the ticks
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
...     agg_side = bars_run_dollar.apply_tick_rule(data["price"][i])[0]
...     new_bar = bars_run_dollar.process_tick(
...         {
...             "date": pd.to_datetime(data["date"][i]),
...             "price": data["price"][i],
...             "volume": data["volume"][i],
...             "aggressor_side": agg_side,
...             "ticker": "ticker",
...         }
...     )
...
>>> # Access all the generated bars
>>> generated_bars = bars_run_dollar.bars
>>> generated_bars  
[...]

Research Notebook

The following research notebook can be used to better understand the online data structures.


References