Noise Reduction
Note
Underlying Literature
The following material and code implementations have been adapted from:
-
Kinetic Component Analysis by Marcos Lopez de Prado & Riccardo Rebonato.
In the paper referred above, the authors, Marcos Lopez de Prado and Riccardo Rebonato, explore different signal processing techniques against their newly proposed Kinetic Component Analysis (KCA). Their proposed technique decomposes a signal into three hidden components, intuitively associated to position, velocity and acceleration.
Any economic observable id defined as \(P(t)\), which may represent prices, rates, yields, or any other asset value quotation,
where \(p(t)\) is defined as the fundamental component and \(h(t)\) as a source of noise.
As markets organically evolve over time, these economic observables acquire a noisy component \(h(t)\), as a result hiding the fundamental component \(p(t)\), which is considered to be a more accurate feature of the economic observable \(P(t)\).
The noise reduction methods in the paper aim to reduce the amount of noise in the economic observable of interest in order to obtain a smoother feature given by the fundamental component \(p(t)\).
Below are reference implementations for the methods discussed in the paper. All the noise reduction methods inherit a common
NoiseReductionMethod
class. Below we illustrate some examples of the introduced methods to the MlFinLab package.
The noise reduction methods implemented expect a pandas.DataFrame as a time series with the “noisy signal” as the first column of the dataframe. To run the core noise reduction algorithm on the noisy signal use generate_signal() method. For other provided class methods, see corresponding implementation reference sections.
>>> import pandas as pd
>>> from mlfinlab.features import kca, fft, lowess
>>> from mlfinlab.util.generate_dataset import generate_periodic_signal
>>> # Obtain signal as a pandas DataFrame time series
>>> (t, signal, z) = generate_periodic_signal(10, 300, 0.5)
>>> sine_df = pd.DataFrame(z, index=t)
>>> # Instantiate classes with time series
>>> kca = kca.KCA(sine_df, include_std=True, sigma=2)
>>> fft = fft.FFT(sine_df, min_alpha=0.05)
>>> lowess = lowess.LOWESS(sine_df, fraction=0.1)
>>> # Generate signals and remove noise
>>> kca_position_signal = kca.generate_signal()
>>> fft_signal = fft.generate_signal()
>>> lowess_signal = lowess.generate_signal()
>>> lowess_signal
array([[...
Kinetic Component Analysis (KCA)
Overview
The authors present Kinetic Component Analysis, as a state-space application that extracts the signal from a series of noisy measurements by applying a Kalman Filter on a Taylor expansion of a stochastic process.
The value of KCA against other methods explored in the paper is that it provides the following unique capabilities:
Band estimates (e.g. confidence intervals) in addition to point estimates
Additional information via decomposition of three hidden components (position, velocity, acceleration)
Forecasting capabilities
Implementation
Important
The greater the value of seed the more likely we are to overfit. This value indicates to the model that a greater proportion of noise comes from the states rather than the measurements (observations).
The authors recommned to try different values of seed that is consistant with your understanding of the system in question.
Generally, opt for lower values of seed to avoid overfitting.
- class KCA(time_series: DataFrame, include_std: bool | None = False, sigma: int | None = 0, seed: int | None = 0.01)
-
Kinetic Component Analysis (KCA), a state-space application that extracts a signal from a series of noisy measurements by applying a Kalman Filter on a Taylor expansion of a stochastic process.
KCA decomposes the noisy signal in terms of three hidden components, which can be intuitively associated with position, velocity and acceleration.
- property forecast_df: DataFrame
-
Returns DataFrame with forecast.
- Returns:
-
(pd.DataFrame) DataFrame with component forecasts.
- generate_next_point_forecast() Series
-
Generate forecast for next point in signal (position).
- Returns:
-
(pd.Series) Next point forecast row with all components.
- generate_signal() array
-
Generates signal (position) and other components (velocity and acceleration) by running core KCA fit method.
- Returns:
-
(np.array) Main generated signal (position).
- generate_signal_forecast(num_to_forecast: float) array
-
Generate forecast of main signal (position) and its other components.
- Parameters:
-
num_to_forecast – (int) Number of points to forecast
- Returns:
-
(np.array) Main signal (position) forecast.
- generate_signal_with_forecast(num_to_forecast: int) array
-
Generate forecast of signal (plus components) and return main signal (position) with forecast included.
- Parameters:
-
num_to_forecast – (int) Number of points to forecast
- Returns:
-
(np.array) Main signal (position) with forecast included.
- get_acceleration_signal() array
-
Returns KCA acceleration component signal.
- Returns:
-
(np.array) Acceleration component signal.
- get_position_signal() array
-
Returns KCA position component signal.
- Returns:
-
(np.array) Position signal.
- get_velocity_signal() array
-
Returns KCA velocity component signal.
- Returns:
-
(np.array) Velocity component signal.
- property seed: float
-
Returns the current seed value.
- Returns:
-
(float) Current seed value.
- update_seed(new_seed: float)
-
Update seed value.
- Parameters:
-
new_seed – (float) New seed value to set.
Example
>>> import pandas as pd
>>> from mlfinlab.features import kca
>>> from mlfinlab.util.generate_dataset import generate_periodic_signal
>>> # Obtain signal as a pandas DataFrame time series
>>> (t, signal, z) = generate_periodic_signal(10, 300, 0.5)
>>> sine_df = pd.DataFrame(z, index=t)
>>> # Instantiate KCA class with time series
>>> # (include standard deviation and confidence intervals)
>>> kca = kca.KCA(sine_df, include_std=True, sigma=2, seed=0.001)
>>> # Generate a signal (KCA position)
>>> position = kca.generate_signal()
>>> # Get KCA dataframe with all components
>>> kca_df = kca.dataframe
>>> # Forecast signal forecast 10 steps forward
>>> forecasted_signal = kca.generate_signal_forecast(num_to_forecast=10)
>>> forecasted_signal
array([...
Fast Fourier Transform (FFT)
Overview
Fast Fourier Transform is an algorithm that transforms a signal from time-domain to frequency-domain. FFT is applied to functions the same way PCA is applied to vector spaces.
The authors caution of Fourier’s analysis ability to equally fit noise. Hence, the authors provide us with a mechanism of preventing overfitting noise by means.
To prevent this overfitting problem, the authors provide the following solution. At every iteration of the algorithm, they scan all unused frequencies looking for the one that delivers the greatest decrease on the Ljung-Box statistic. The algorithm stops when the probability associated with the Ljung-Box statistic exceeds threshold min_alpha, or could not reduce by said threshold, as further scanning is unwarranted.
Implementation
- class FFT(time_series: DataFrame, min_alpha: float | None = 0.05)
-
Fast Fourier Transform (FTT), transforms a function of time to a function of frequency. It approximates general functions as linear combinations of periodic functions.
Our FFT class implementation selects the most relevant frequencies on a noisy signal that minimize the Ljung-Box statistic on the sample’s residuals and by consequence extract a signal.
- generate_signal() array
-
Generate signal by applying FFT fit.
- Returns:
-
(np.array) Extracted signal.
- get_critical_value() Tuple[float, float]
-
Returns value of Ljung-Box statistic associated with extracting our generated signal from the FFT fit.
- Returns:
-
(Tuple[float, float]) The first value of the Tuple is the Ljung-Box test statistic, the second element is the p-value based on chi-square distribution.
- get_selected_frequencies() array
-
Returns selected frequency values after generating signal.
- Returns:
-
(np.array) Complex type array with with selected frequencies.
- get_unused_frequencies() dict
-
Returns frequencies that were not selected.
All frequencies that were selected for signal generation will not be listed.
- Returns:
-
(dict) Dictionary with unused frequencies.
- property min_alpha: float
-
Returns minimum alpha.
- Returns:
-
(float) Probability value of obtaining statistical significance.
- set_min_alpha(min_alpha: float)
-
Set minimum alpha.
- Parameters:
-
min_alpha – (float) Set minimum alpha to min_alpha.
Example
>>> import pandas as pd
>>> from mlfinlab.features import fft
>>> from mlfinlab.util.generate_dataset import generate_periodic_signal
>>> # Obtain signal as a pandas DataFrame time series
>>> (t, signal, z) = generate_periodic_signal(10, 300, 0.5)
>>> sine_df = pd.DataFrame(z, index=t)
>>> # Instantiate KCA class with time series
>>> fft = fft.FFT(sine_df)
>>> # Generate a signal
>>> position = fft.generate_signal()
>>> position
array([...
Locally Weighted Scatterplot Smoothing (LOWESS)
Overview
LOWESS fits weighted linear regressions to localized subsets of data in order to filter noise by point.
The parameter fraction indicates to the LOWESS algorithm the fraction of data to use for the fit. As can be seen on the above plot, the smaller the value of fraction, the better the fit (e.g. frac=0.1) to the original signal, at an added cost of stability the bumps in the generated signal.
Implementation
Tip
Our LOWESS implementation is a wrapper around the same algorithm provided by statsmodels. See implementation here.
- class LOWESS(time_series: DataFrame, fraction: float | None = 0.5)
-
Locally Weighted Scatterplot Smoothing (LOWESS), is a non-parametric regression technique to obtain a smooth line through a scatterplot.
This class implementation applies LOWESS to a noisy time-series signal to obtain a smoother signal as a feature.
- property fraction: float
-
Returns fraction of datapoints used for estimation by LOWESS.
- Returns:
-
(float) Fraction of datapoints.
- generate_signal() array
-
Generate signal by applying a LOWESS fit.
- Returns:
-
(np.array) Smooth signal.
- set_fraction(fraction: float)
-
Set LOWESS fraction value.
- Parameters:
-
fraction – (int) Fraction amount to use for estimations.
Example
>>> import pandas as pd
>>> from mlfinlab.features import lowess
>>> from mlfinlab.util.generate_dataset import generate_periodic_signal
>>> # Obtain signal as a pandas DataFrame time series
>>> (t, signal, z) = generate_periodic_signal(10, 300, 0.5)
>>> sine_df = pd.DataFrame(z, index=t)
>>> # Instantiate KCA class with time series
>>> lowess = lowess.LOWESS(sine_df)
>>> # Generate a signal
>>> position = lowess.generate_signal()
>>> position
array([...
Research Notebook
The following research notebook can be used to better understand noise reduction methods.