mlfinlab.microstructural_features.feature_generator

Inter-bar feature generator which uses trades data and bars index to calculate inter-bar features.

Module Contents

Classes

MicrostructuralFeaturesGenerator

Class which is used to generate inter-bar features when bars are already compressed.

class MicrostructuralFeaturesGenerator(trades_input: str, pandas.DataFrame, tick_num_series: pandas.Series, batch_size: int = 20000000.0, volume_encoding: dict = None, pct_encoding: dict = None)

Class which is used to generate inter-bar features when bars are already compressed.

Parameters:
  • trades_input – (str/pd.DataFrame) Path to the csv file or Pandas DataFrame containing raw tick data in the format[date_time, price, volume].

  • tick_num_series – (pd.Series) Series of tick number where bar was formed.

  • batch_size – (int) Number of rows to read in from the csv, per batch.

  • volume_encoding – (dict) Dictionary of encoding scheme for trades size used to calculate entropy on encoded messages.

  • pct_encoding – (dict) Dictionary of encoding scheme for log returns used to calculate entropy on encoded messages.

get_features(verbose=True, to_csv=False, output_path=None)

Reads a csv file of ticks or pd.DataFrame in batches and then constructs corresponding microstructural intra-bar features: average tick size, tick rule sum, VWAP, Kyle lambda, Amihud lambda, Hasbrouck lambda, tick/volume/pct Shannon, Lempel-Ziv, Plug-in entropies if corresponding mapping dictionaries are provided (self.volume_encoding, self.pct_encoding). The csv file must have only 3 columns: date_time, price, & volume.

Parameters:
  • verbose – (bool) Flag whether to print message on each processed batch or not.

  • to_csv – (bool) Flag for writing the results of bars generation to local csv file, or to in-memory DataFrame.

  • output_path – (bool) Path to results file, if to_csv = True.

Returns:

(DataFrame or None) Microstructural features for bar index.