mlfinlab.cross_validation.multi_asset_cross_validation
This module implements Purged KFold and Combinatorial Purged KFold for multi-asset (stacked) datasets.
Module Contents
Classes
Extend KFold class to work with labels that span intervals for multi-asset(stacked) datasets. |
|
Implements Combinatorial Purged Cross Validation class (CPCV) to work with labels that span intervals |
Functions
|
Advances in Financial Machine Learning, Snippet 7.1, page 106. |
- ml_get_train_times_multi_asset(samples_info_sets: pandas.Series, test_times: pandas.Series) pandas.Series
-
Advances in Financial Machine Learning, Snippet 7.1, page 106.
Purging observations in the training set for multi-asset tasks.
This function finds the training set indexes given the information on which each record is based and the range for the test set. Given test_times, it finds the times of the training observations.
- Parameters:
-
-
samples_info_sets – (pd.Series) The information range on which each record is constructed from samples_info_sets.index: Time when the information extraction started. samples_info_sets.value: Time when the information extraction ended.
-
test_times – (pd.Series) Times for the test dataset.
-
- Returns:
-
(pd.Series) Training set.
- class MultiAssetPurgedKFold(n_splits: int = 3, samples_info_sets: pandas.Series = None, pct_embargo: float = 0.0)
-
Bases:
sklearn.model_selection.KFold
Extend KFold class to work with labels that span intervals for multi-asset(stacked) datasets.
The train is purged of observations overlapping test-label intervals. Test set is assumed contiguous (shuffle=False), w/o training samples in between.
- split(X: pandas.DataFrame, y: pandas.Series = None, groups: pandas.Series = None) tuple
-
The main method to call for the MultiAssetPurgedKFold class.
- Parameters:
-
-
X – (pd.DataFrame) Samples dataset that is to be split.
-
y – (pd.Series) Sample ranking series.
-
groups – (pd.Series) Deprecated parameter, using samples_info_sets as groups.
-
- Returns:
-
(tuple) [train list of sample indices, and test list of sample indices].
- get_n_splits(X=None, y=None, groups=None)
-
Returns the number of splitting iterations in the cross-validator
Parameters
- Xobject
-
Always ignored, exists for compatibility.
- yobject
-
Always ignored, exists for compatibility.
- groupsobject
-
Always ignored, exists for compatibility.
Returns
- n_splitsint
-
Returns the number of splitting iterations in the cross-validator.
- __repr__()
Return repr(self).
- class MultiAssetCombinatorialPurgedKFold(n_splits: int = 3, n_test_splits: int = 2, samples_info_sets: pandas.Series = None, pct_embargo: float = 0.0)
-
Bases:
sklearn.model_selection.KFold
Implements Combinatorial Purged Cross Validation class (CPCV) to work with labels that span intervals for multi-asset datasets.
The train is purged of observations overlapping test-label intervals. Test set is assumed contiguous (shuffle=False), w/o training samples in between.
- split(X: pandas.DataFrame, y: pandas.Series = None, groups: pandas.Series = None) tuple
-
The main method to call for the MultiAssetCombinatorialPurgedKFold class.
- Parameters:
-
-
X – (pd.DataFrame) Samples dataset that is to be split.
-
y – (pd.Series) Deprecated parameter, sample ranking series.
-
groups – (pd.Series) Deprecated parameter, using samples_info_sets as groups.
-
- Returns:
-
(tuple) [train list of sample indices, and test list of sample indices].
- get_folds_splits(combinatorial_groups_splits: list) list
-
Find train and test folds for backtest_paths.
The example input for KFold (4, 2) looks like: [(2, 3), (1, 3), (1, 2), (0, 3), (0, 2), (0, 1)]
The output looks like: [[{‘train’: array([2, 3]), ‘test’: 0},{‘train’: array([2, 3]), ‘test’: 1}, {‘train’: array([1, 3]), ‘test’: 2},{‘train’: array([1, 2]), ‘test’: 3}], [{‘train’: array([1, 3]), ‘test’: 0},{‘train’: array([0, 3]), ‘test’: 1}, {‘train’: array([1, 3]), ‘test’: 2},{‘train’: array([1, 2]), ‘test’: 3}], [{‘train’: array([1, 2]), ‘test’: 0}, {‘train’: array([0, 2]), ‘test’: 1}, {‘train’: array([0, 1]), ‘test’: 2}, {‘train’: array([1, 2]), ‘test’: 3}], [{‘train’: array([1, 3]), ‘test’: 0}, {‘train’: array([0, 3]), ‘test’: 1}, {‘train’: array([0, 3]), ‘test’: 2}, {‘train’: array([0, 2]), ‘test’: 3}], [{‘train’: array([1, 2]), ‘test’: 0}, {‘train’: array([0, 2]), ‘test’: 1}, {‘train’: array([0, 1]), ‘test’: 2}, {‘train’: array([0, 2]), ‘test’: 3}], [{‘train’: array([1, 2]), ‘test’: 0}, {‘train’: array([0, 2]), ‘test’: 1}, {‘train’: array([0, 1]), ‘test’: 2}, {‘train’: array([0, 1]), ‘test’: 3}]]
- Parameters:
-
combinatorial_groups_splits – (list) Tuples with train folds splits.
- Returns:
-
(list) Lists of dictionaries of all train/test splits for each fold.
- get_n_splits(X=None, y=None, groups=None)
-
Returns the number of splitting iterations in the cross-validator
Parameters
- Xobject
-
Always ignored, exists for compatibility.
- yobject
-
Always ignored, exists for compatibility.
- groupsobject
-
Always ignored, exists for compatibility.
Returns
- n_splitsint
-
Returns the number of splitting iterations in the cross-validator.
- __repr__()
Return repr(self).