Sample Datasets

In this module we provide very small samples of data to help users validate some functions.

The 3 small data sets are:

  • Tick data (2011/07/31 - 2011/07/31)

  • Dollar Bars Data Structure (2015/01/01 - 2015/01/29)

  • ETF Dataset (2008 - 2016)

Tick Data

MlFinLab provides a sample (2011/07/31 - 2011/07/31) of tick data for E-Mini S&P 500 futures which can be used to test bar compression algorithms, microstructural features, etc. Tick data sample consists of Timestamp, Price and Volume.

load_tick_sample() DataFrame

Loads E-Mini S&P 500 futures tick data sample.

Returns:

(pd.DataFrame) Frame with tick data sample.

Dollar Bars

We also provide a sample (2015/01/01 - 2015/01/29) of dollar bars for E-Mini S&P 500 futures. Data set structure:

  • Open price (open)

  • High price (high)

  • Low price (low)

  • Close price (close)

  • Volume (cum_volume)

  • Dollar volume traded (cum_dollar)

  • Number of ticks inside of bar (cum_ticks)

Tip

You can find more information on dollar bars and other bar compression algorithms in Data Structures

load_dollar_bar_sample() DataFrame

Loads E-Mini S&P 500 futures dollar bars data sample.

Returns:

(pd.DataFrame) Frame with dollar bar data sample.

ETF Prices

The data set consists of close prices for:
  • EEM, EWG, TIP, EWJ, EFA, IEF, EWQ, EWU, XLB, XLE, XLF, LQD, XLK, XLU, EPP, FXI, VGK, VPL, SPY, TLT, BND, CSJ, DIA

  • From 2008 till 2016.

It can be used to test and validate portfolio optimization techniques.

load_stock_prices() DataFrame

Loads stock prices data sets consisting of EEM, EWG, TIP, EWJ, EFA, IEF, EWQ, EWU, XLB, XLE, XLF, LQD, XLK, XLU, EPP, FXI, VGK, VPL, SPY, TLT, BND, CSJ, DIA starting from 2008 till 2016.

Returns:

(pd.DataFrame) The stock_prices data frame.

Example Code

from mlfinlab.datasets import (load_tick_sample, load_stock_prices, load_dollar_bar_sample)

tick_df = load_tick_sample()
dollar_bars_df = load_dollar_bar_sample()
stock_prices_df = load_stock_prices()