Online Data Structures
The classes below use tick data to produce time, volume, dollar, imbalance and run bars in a live streaming way. New information can quickly be added to produce these bars on the go. The same functionality is offered here as explained in the Standard Data Structures, Imbalance Bars and Run Bars section in MlFinLab. In order to process a big dataset - as a csv, parquet, or pandas DataFrame use the standard Data Structures functionality, but use these Online Data Structure classes for generating bars on the go.
For those new to the topic, it is discussed in the graduate level textbook: Advances in Financial Machine Learning, Chapter 2.
Note
Underlying Literature
The following sources elaborate extensively on the topic:
-
Advances in Financial Machine Learning, Chapter 2 by Marcos Lopez de Prado.
Tip
A fundamental paper that you need to read to have a better grasp on these concepts is: Easley, David, Marcos M. López de Prado, and Maureen O’Hara. “The volume clock: Insights into the high-frequency paradigm.” The Journal of Portfolio Management 39.1 (2012): 19-29.
Each generated bar contains the following:
Key |
Description |
---|---|
timestamp |
the timestamp at which the bar ends |
start timestamp |
the first tick timestamp of the bar as specified by the threshold |
open |
the open price of the security in the generated bar |
high |
the highest price of the security in the generated bar |
low |
the lowest price of the security in the generated bar |
close |
the close price of the security in the generated bar |
volume |
the volume of the security traded in the bar |
cum_buy_volume |
cumulative buy volume of ticks in the bar |
cum_dollar_value |
cumulative dollar value of ticks in the bar |
tick_rule_buy_volume |
amount of buy volume estimated by the Tick Rule |
num_ticks |
number of ticks in the bar |
ticker |
chosen ticker for the class |
Time Bars
Time bars are obtained by sampling information at fixed time intervals, e.g., once every minute. These are the traditional open, high, low, close bars that traders are used to seeing. If you want to know more about time bars please refer to the Standard Data Structures section in MlFinLab.
Implementation
- class TimeBarGenerator(threshold: int = 86400, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Class which implements time bars compression.
- __init__(threshold: int = 86400, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None) object
-
Initialize Time Bar Generator class.
- Parameters:
-
-
threshold – (int) number of seconds in 1 bar.
-
tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field names should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }
-
aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.
-
exchange – (str) exchange name.
-
contract – (str) contract name.
-
- apply_tick_rule(price: float) Tuple[int, float]
-
Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.
- process_tick(tick: dict) bool
-
Process one tick. :param tick: (dict) tick to process. {‘timestamp’: pd.Timestamp, ‘price’: float, ‘volume’: float, ‘aggressor_side’(optional): int} :return: (bool) Flag indicating that a new bar was formed.
- set_threshold(threshold: float)
-
Set new threshold for bar calculations. :param threshold: (float) threshold to set.
Attributes:
TimeBarGenerator.bars
contains the information for each bar.
Example
Class TimeBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.
-
timestamp refers to the last time stamp in the bar
-
start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.
>>> import pandas as pd
>>> from mlfinlab.online_data_structures import time_bars
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate time bars according to threshold in seconds
>>> bars_time = time_bars.TimeBarGenerator(
... threshold=60,
... tick_fields_mapping={
... "timestamp": "date",
... "price": "price",
... "volume": "volume",
... "aggressor_side": "aggressor_side",
... "ticker": "ticker",
... },
... )
>>> # Get the time bars from the ticks
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
... agg_side = bars_time.apply_tick_rule(data["price"][i])[0]
... new_bar = bars_time.process_tick(
... {
... "date": pd.to_datetime(data["date"][i]),
... "price": data["price"][i],
... "volume": data["volume"][i],
... "aggressor_side": agg_side,
... "ticker": "ticker",
... }
... )
... # If new_bar is True then a new bar is generated
... if new_bar:
... # We can change the threshold while processing bars
... # For example when the first bar is formed we change the threshold to two minutes
... bars_time.set_threshold(120)
...
>>> # Access all the generated bars
>>> generated_bars = bars_time.bars
>>> generated_bars
[...]
Volume Bars
Volume bars sample information every time a pre-defined amount of the security’s units (shares, futures contracts, etc.) have been exchanged. If you want to know more about time bars please refer to the Standard Data Structures section in MlFinLab.
Implementation
- class VolumeBarGenerator(threshold: float, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Class for Volume Bar Generation.
- __init__(threshold: float, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Initialize Volume Bar Generator class.
- Parameters:
-
-
threshold – (float) volume amount threshold.
-
tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field names should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }
-
aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.
-
exchange – (str) exchange name.
-
contract – (str) contract name.
-
- apply_tick_rule(price: float) Tuple[int, float]
-
Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.
- process_tick(tick: dict | list) bool
-
Process one tick or a list of ticks. :return: (bool) Flag indicating that a new bar was formed.
- set_threshold(threshold: float)
-
Set new threshold for bar calculations. :param threshold: (float) threshold to set.
Attributes:
VolumeBarGenerator.bars
contains the information for each bar.
Example
Class VolumeBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.
-
timestamp refers to the last time stamp in the bar
-
start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.
>>> import pandas as pd
>>> from mlfinlab.online_data_structures import volume_bars
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate volume bars according to volume traded
>>> bars_volume = volume_bars.VolumeBarGenerator(
... threshold=1000,
... tick_fields_mapping={
... "timestamp": "date",
... "price": "price",
... "volume": "volume",
... "aggressor_side": "aggressor_side",
... "ticker": "ticker",
... },
... )
>>> # Get the volume bars from the ticks
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
... agg_side = bars_volume.apply_tick_rule(data["price"][i])[0]
... new_bar = bars_volume.process_tick(
... {
... "date": pd.to_datetime(data["date"][i]),
... "price": data["price"][i],
... "volume": data["volume"][i],
... "aggressor_side": agg_side,
... "ticker": "ticker",
... }
... )
... # If new_bar is True then a new bar is generated
... if new_bar:
... # We can change the the threshold while processing bars
... # For example when the first bar is formed we change the volume threshold to 2000 (the amount volume traded to form a new bar)
... bars_volume.set_threshold(2000)
...
>>> # Access all the generated bars
>>> generated_bars = bars_volume.bars
>>> generated_bars
[...]
Dollar Bars
Dollar bars are formed by sampling an observation every time a pre-defined market value is exchanged. If you want to know more about time bars please refer to the Standard Data Structures section in MlFinLab.
Implementation
- class DollarBarGenerator(threshold: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Class which implements dollar bars compression.
- __init__(threshold: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Initialize Dollar Bar Generator class.
- Parameters:
-
-
threshold – (int) dollar amount threshold.
-
tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field names should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }
-
aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.
-
exchange – (str) exchange name.
-
contract – (str) contract name.
-
- apply_tick_rule(price: float) Tuple[int, float]
-
Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.
- process_tick(tick: dict | list) bool
-
Process one tick or a list of ticks. :return: (bool) Flag indicating that a new bar was formed.
- set_threshold(threshold: float)
-
Set new threshold for bar calculations. :param threshold: (float) threshold to set.
Attributes:
DollarBarGenerator.bars
contains the information for each bar.
Example
Class DollarBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.
-
timestamp refers to the last time stamp in the bar
-
start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.
>>> import pandas as pd
>>> from mlfinlab.online_data_structures import dollar_bars
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate dollar bars according to dollar amount traded
>>> bars_dollar = dollar_bars.DollarBarGenerator(
... threshold=1_000_000,
... tick_fields_mapping={
... "timestamp": "date",
... "price": "price",
... "volume": "volume",
... "aggressor_side": "aggressor_side",
... "ticker": "ticker",
... },
... )
>>> # We can change the threshold while processing bars
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
... agg_side = bars_dollar.apply_tick_rule(data["price"][i])[0]
... new_bar = bars_dollar.process_tick(
... {
... "date": pd.to_datetime(data["date"][i]),
... "price": data["price"][i],
... "volume": data["volume"][i],
... "aggressor_side": agg_side,
... "ticker": "ticker",
... }
... )
... # If new_bar is True then a new bar is generated
... if new_bar:
... # User specified condition to change the threshold
... # For example when the first bar is formed we change the dollar threshold to 2 000 000
... bars_dollar.set_threshold(2_000_000)
...
>>> # Access all the generated bars
>>> generated_bars = bars_dollar.bars
>>> generated_bars
[...]
Imbalance Bars
Imbalance bars form part of information bars where a bar is sampled when new information enters the market. 2 types of imbalance bars are implemented in MlFinLab:
-
Expected number of ticks, defined as EMA (book implementation)
-
Constant number of expected number of ticks.
If you want to know more about time bars please refer to the Imbalance Bars section in MlFinLab.
Implementation
- class ImbalanceBarGenerator(imbalance_type: str, expected_imbalance_window: int, exp_num_ticks_init: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Class which implements imbalance bars compression.
- __init__(imbalance_type: str, expected_imbalance_window: int, exp_num_ticks_init: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Initialize Imbalance Bar Generator class.
- Parameters:
-
-
imbalance_type – (str) type of the imbalance bar. Possible types : ‘tick_imbalance’,’dollar_imbalance’, ‘volume_imbalance’
-
expected_imbalance_window – (int) Window used to estimate expected imbalance from previous trades
-
exp_num_ticks_init – (int) Initial estimate for expected number of ticks in bar. For Const Imbalance Bars expected number of ticks equals expected number of ticks init.
-
tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field inputs should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }
-
aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.
-
exchange – (str) exchange name.
-
contract – (str) contract name.
-
- apply_tick_rule(price: float) Tuple[int, float]
-
Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.
- process_tick(tick: dict | list) bool
-
Process one tick or a list of ticks. :return: (bool) Flag indicating that a new bar was formed.
- set_threshold(threshold: float)
-
Set new threshold for bar calculations. :param threshold: (float) threshold to set.
Attributes:
ImbalanceBarGenerator.bars
contains the information for each bar.
Example
Class ImbalanceBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.
-
timestamp refers to the last time stamp in the bar
-
start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.
>>> import pandas as pd
>>> from mlfinlab.online_data_structures import imbalance_bars
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate imbalance dollar bars according to expected imbalance window
>>> # and expected number of initial ticks per bar
>>> bars_imbalance_bars = imbalance_bars.ImbalanceBarGenerator(
... imbalance_type="dollar_imbalance",
... expected_imbalance_window=3,
... exp_num_ticks_init=30,
... tick_fields_mapping={
... "timestamp": "date",
... "price": "price",
... "volume": "volume",
... "aggressor_side": "aggressor_side",
... "ticker": "ticker",
... },
... )
>>> # Get the imbalance dollar bars from the ticks
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
... agg_side = bars_imbalance_bars.apply_tick_rule(data["price"][i])[0]
... new_bar = bars_imbalance_bars.process_tick(
... {
... "date": pd.to_datetime(data["date"][i]),
... "price": data["price"][i],
... "volume": data["volume"][i],
... "aggressor_side": agg_side,
... "ticker": "ticker",
... }
... )
...
>>> # Access all the generated bars
>>> generated_bars = bars_imbalance_bars.bars
>>> generated_bars
[...]
Run Bars
Run bars also form part of information bars where a bar is sampled when new information enters the market. Run bars share the same mathematical structure as imbalance bars, however, instead of looking at each individual trade, we are looking at sequences of trades in the same direction If you want to know more about time bars please refer to the Run Bars section in MlFinLab.
Implementation
- class RunsBarGenerator(runs_type: str, expected_imbalance_window: int, exp_num_ticks_init: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Class which implements runs bars compression.
- __init__(runs_type: str, expected_imbalance_window: int, exp_num_ticks_init: int, tick_fields_mapping: dict | None = None, aggressor_side_mapping: dict | None = None, exchange: str | None = None, contract: str | None = None)
-
Initialize Runs Bar Generator class.
- Parameters:
-
-
runs_type – (str) type of the runs bar. Possible types : ‘tick_runs’,’dollar_runs’, ‘volume_runs’
-
expected_imbalance_window – (int) Window used to estimate expected imbalance from previous trades
-
exp_num_ticks_init – (int) Initial estimate for expected number of ticks in bar. For Const Imbalance Bars expected number of ticks equals expected number of ticks init.
-
tick_fields_mapping – (dict) Dict with mapping of tick data field names into those accepted by generator. Field inputs should be mapped to {‘timestamp’,’price’,’volume’, ‘aggressor_side’(optional) }
-
aggressor_side_mapping – (dict) Dict with mapping of aggressor side values into -1 and 1.
-
exchange – (str) exchange name.
-
contract – (str) contract name.
-
- apply_tick_rule(price: float) Tuple[int, float]
-
Applies the tick rule as defined on page 29 of Advances in Financial Machine Learning. :param price: (float) Price at time t. :return: (Tuple[int, float]) The signed tick and tick difference.
- process_tick(tick: dict | list) bool
-
Process one tick or a list of ticks. :return: (bool) Flag indicating that a new bar was formed.
- set_threshold(threshold: float)
-
Set new threshold for bar calculations. :param threshold: (float) threshold to set.
Attributes:
RunsBarGenerator.bars
contains the information for each bar.
Example
Class RunsBarGenerator creates object bars containing the timestamp, start timestamp, open, high, low, close, volume, cumulative buy volume, number of ticks, percent stacked ticks and ticker for each bar.
-
timestamp refers to the last time stamp in the bar
-
start timestamp refers to the first tick timestamp of the bar as specified by the threshold in seconds.
>>> import pandas as pd
>>> from mlfinlab.online_data_structures import runs_bars
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> data = pd.read_csv(tick_data_url)
>>> # Generate run dollar bars according to expected imbalance window
>>> # and expected number of initial ticks per bar
>>> bars_run_dollar = runs_bars.RunsBarGenerator(
... runs_type="dollar_imbalance",
... expected_imbalance_window=3,
... exp_num_ticks_init=30,
... tick_fields_mapping={
... "timestamp": "date",
... "price": "price",
... "volume": "volume",
... "aggressor_side": "aggressor_side",
... "ticker": "ticker",
... },
... )
>>> # Get the run dollar bars from the ticks
>>> # Range in this case specifies the number of incoming ticks to iterate through
>>> for i in range(2000):
... agg_side = bars_run_dollar.apply_tick_rule(data["price"][i])[0]
... new_bar = bars_run_dollar.process_tick(
... {
... "date": pd.to_datetime(data["date"][i]),
... "price": data["price"][i],
... "volume": data["volume"][i],
... "aggressor_side": agg_side,
... "ticker": "ticker",
... }
... )
...
>>> # Access all the generated bars
>>> generated_bars = bars_run_dollar.bars
>>> generated_bars
[...]
Research Notebook
The following research notebook can be used to better understand the online data structures.