mlfinlab.cross_validation.multi_asset_cross_validation

This module implements Purged KFold and Combinatorial Purged KFold for multi-asset (stacked) datasets.

Module Contents

Classes

MultiAssetPurgedKFold

Extend KFold class to work with labels that span intervals for multi-asset(stacked) datasets.

MultiAssetCombinatorialPurgedKFold

Implements Combinatorial Purged Cross Validation class (CPCV) to work with labels that span intervals

Functions

ml_get_train_times_multi_asset(→ pandas.Series)

Advances in Financial Machine Learning, Snippet 7.1, page 106.

ml_get_train_times_multi_asset(samples_info_sets: pandas.Series, test_times: pandas.Series) pandas.Series

Advances in Financial Machine Learning, Snippet 7.1, page 106.

Purging observations in the training set for multi-asset tasks.

This function finds the training set indexes given the information on which each record is based and the range for the test set. Given test_times, it finds the times of the training observations.

Parameters:
  • samples_info_sets – (pd.Series) The information range on which each record is constructed from samples_info_sets.index: Time when the information extraction started. samples_info_sets.value: Time when the information extraction ended.

  • test_times – (pd.Series) Times for the test dataset.

Returns:

(pd.Series) Training set.

class MultiAssetPurgedKFold(n_splits: int = 3, samples_info_sets: pandas.Series = None, pct_embargo: float = 0.0)

Bases: sklearn.model_selection.KFold

Extend KFold class to work with labels that span intervals for multi-asset(stacked) datasets.

The train is purged of observations overlapping test-label intervals. Test set is assumed contiguous (shuffle=False), w/o training samples in between.

split(X: pandas.DataFrame, y: pandas.Series = None, groups: pandas.Series = None) tuple

The main method to call for the MultiAssetPurgedKFold class.

Parameters:
  • X – (pd.DataFrame) Samples dataset that is to be split.

  • y – (pd.Series) Sample ranking series.

  • groups – (pd.Series) Deprecated parameter, using samples_info_sets as groups.

Returns:

(tuple) [train list of sample indices, and test list of sample indices].

get_n_splits(X=None, y=None, groups=None)

Returns the number of splitting iterations in the cross-validator

Parameters

Xobject

Always ignored, exists for compatibility.

yobject

Always ignored, exists for compatibility.

groupsobject

Always ignored, exists for compatibility.

Returns

n_splitsint

Returns the number of splitting iterations in the cross-validator.

__repr__()

Return repr(self).

class MultiAssetCombinatorialPurgedKFold(n_splits: int = 3, n_test_splits: int = 2, samples_info_sets: pandas.Series = None, pct_embargo: float = 0.0)

Bases: sklearn.model_selection.KFold

Implements Combinatorial Purged Cross Validation class (CPCV) to work with labels that span intervals for multi-asset datasets.

The train is purged of observations overlapping test-label intervals. Test set is assumed contiguous (shuffle=False), w/o training samples in between.

split(X: pandas.DataFrame, y: pandas.Series = None, groups: pandas.Series = None) tuple

The main method to call for the MultiAssetCombinatorialPurgedKFold class.

Parameters:
  • X – (pd.DataFrame) Samples dataset that is to be split.

  • y – (pd.Series) Deprecated parameter, sample ranking series.

  • groups – (pd.Series) Deprecated parameter, using samples_info_sets as groups.

Returns:

(tuple) [train list of sample indices, and test list of sample indices].

get_folds_splits(combinatorial_groups_splits: list) list

Find train and test folds for backtest_paths.

The example input for KFold (4, 2) looks like: [(2, 3), (1, 3), (1, 2), (0, 3), (0, 2), (0, 1)]

The output looks like: [[{‘train’: array([2, 3]), ‘test’: 0},{‘train’: array([2, 3]), ‘test’: 1}, {‘train’: array([1, 3]), ‘test’: 2},{‘train’: array([1, 2]), ‘test’: 3}], [{‘train’: array([1, 3]), ‘test’: 0},{‘train’: array([0, 3]), ‘test’: 1}, {‘train’: array([1, 3]), ‘test’: 2},{‘train’: array([1, 2]), ‘test’: 3}], [{‘train’: array([1, 2]), ‘test’: 0}, {‘train’: array([0, 2]), ‘test’: 1}, {‘train’: array([0, 1]), ‘test’: 2}, {‘train’: array([1, 2]), ‘test’: 3}], [{‘train’: array([1, 3]), ‘test’: 0}, {‘train’: array([0, 3]), ‘test’: 1}, {‘train’: array([0, 3]), ‘test’: 2}, {‘train’: array([0, 2]), ‘test’: 3}], [{‘train’: array([1, 2]), ‘test’: 0}, {‘train’: array([0, 2]), ‘test’: 1}, {‘train’: array([0, 1]), ‘test’: 2}, {‘train’: array([0, 2]), ‘test’: 3}], [{‘train’: array([1, 2]), ‘test’: 0}, {‘train’: array([0, 2]), ‘test’: 1}, {‘train’: array([0, 1]), ‘test’: 2}, {‘train’: array([0, 1]), ‘test’: 3}]]

Parameters:

combinatorial_groups_splits – (list) Tuples with train folds splits.

Returns:

(list) Lists of dictionaries of all train/test splits for each fold.

get_n_splits(X=None, y=None, groups=None)

Returns the number of splitting iterations in the cross-validator

Parameters

Xobject

Always ignored, exists for compatibility.

yobject

Always ignored, exists for compatibility.

groupsobject

Always ignored, exists for compatibility.

Returns

n_splitsint

Returns the number of splitting iterations in the cross-validator.

__repr__()

Return repr(self).