Noise Reduction

Note

Underlying Literature

The following material and code implementations have been adapted from:

Kinetic Component Analysis by Marcos Lopez de Prado & Riccardo Rebonato.

Comparison of noise reduction methods against a sine wave signal with a random component of gaussian noise.

In the paper referred above, the authors, Marcos Lopez de Prado and Riccardo Rebonato, explore different signal processing techniques against their newly proposed Kinetic Component Analysis (KCA). Their proposed technique decomposes a signal into three hidden components, intuitively associated to position, velocity and acceleration.

Any economic observable id defined as \(P(t)\), which may represent prices, rates, yields, or any other asset value quotation,

\[P(t) = p(t) + h(t)\]

where \(p(t)\) is defined as the fundamental component and \(h(t)\) as a source of noise.

As markets organically evolve over time, these economic observables acquire a noisy component \(h(t)\), as a result hiding the fundamental component \(p(t)\), which is considered to be a more accurate feature of the economic observable \(P(t)\).

The noise reduction methods in the paper aim to reduce the amount of noise in the economic observable of interest in order to obtain a smoother feature given by the fundamental component \(p(t)\).

Below are reference implementations for the methods discussed in the paper. All the noise reduction methods inherit a common NoiseReductionMethod class. Below we illustrate some examples of the introduced methods to the MlFinLab package.

The noise reduction methods implemented expect a pandas.DataFrame as a time series with the “noisy signal” as the first column of the dataframe. To run the core noise reduction algorithm on the noisy signal use generate_signal() method. For other provided class methods, see corresponding implementation reference sections.

                      >>> import pandas as pd
>>> from mlfinlab.features import kca, fft, lowess
>>> from mlfinlab.util.generate_dataset import generate_periodic_signal
>>> # Obtain signal as a pandas DataFrame time series
>>> (t, signal, z) = generate_periodic_signal(10, 300, 0.5)
>>> sine_df = pd.DataFrame(z, index=t)
>>> # Instantiate classes with time series
>>> kca = kca.KCA(sine_df, include_std=True, sigma=2)
>>> fft = fft.FFT(sine_df, min_alpha=0.05)
>>> lowess = lowess.LOWESS(sine_df, fraction=0.1)
>>> # Generate signals and remove noise
>>> kca_position_signal = kca.generate_signal()
>>> fft_signal = fft.generate_signal()
>>> lowess_signal = lowess.generate_signal()
>>> lowess_signal  
array([[...

                    

Kinetic Component Analysis (KCA)

Overview

Illustration shows KCA components with confidence intervals over noisy measurements and original sine wave signal.

The authors present Kinetic Component Analysis, as a state-space application that extracts the signal from a series of noisy measurements by applying a Kalman Filter on a Taylor expansion of a stochastic process.

The value of KCA against other methods explored in the paper is that it provides the following unique capabilities:

Band estimates (e.g. confidence intervals) in addition to point estimates

Additional information via decomposition of three hidden components (position, velocity, acceleration)

Forecasting capabilities

Implementation

Important

The greater the value of seed the more likely we are to overfit. This value indicates to the model that a greater proportion of noise comes from the states rather than the measurements (observations).

The authors recommned to try different values of seed that is consistant with your understanding of the system in question.

Generally, opt for lower values of seed to avoid overfitting.

class KCA(time_series: DataFrame, include_std: bool | None = False, sigma: int | None = 0, seed: int | None = 0.01)

Kinetic Component Analysis (KCA), a state-space application that extracts a signal from a series of noisy measurements by applying a Kalman Filter on a Taylor expansion of a stochastic process.

KCA decomposes the noisy signal in terms of three hidden components, which can be intuitively associated with position, velocity and acceleration.

property forecast_df: DataFrame

Returns DataFrame with forecast.

Returns:: (pd.DataFrame) DataFrame with component forecasts.

generate_next_point_forecast() → Series

Generate forecast for next point in signal (position).

Returns:: (pd.Series) Next point forecast row with all components.

generate_signal() → array

Generates signal (position) and other components (velocity and acceleration) by running core KCA fit method.

Returns:: (np.array) Main generated signal (position).

generate_signal_forecast(num_to_forecast: float) → array

Generate forecast of main signal (position) and its other components.

Parameters:: num_to_forecast – (int) Number of points to forecast
Returns:: (np.array) Main signal (position) forecast.

generate_signal_with_forecast(num_to_forecast: int) → array

Generate forecast of signal (plus components) and return main signal (position) with forecast included.

Parameters:: num_to_forecast – (int) Number of points to forecast
Returns:: (np.array) Main signal (position) with forecast included.

get_acceleration_signal() → array

Returns KCA acceleration component signal.

Returns:: (np.array) Acceleration component signal.

get_position_signal() → array

Returns KCA position component signal.

Returns:: (np.array) Position signal.

get_velocity_signal() → array

Returns KCA velocity component signal.

Returns:: (np.array) Velocity component signal.

property seed: float

Returns the current seed value.

Returns:: (float) Current seed value.

update_seed(new_seed: float)

Update seed value.

Parameters:: new_seed – (float) New seed value to set.

Example

                          >>> import pandas as pd
>>> from mlfinlab.features import kca
>>> from mlfinlab.util.generate_dataset import generate_periodic_signal
>>> # Obtain signal as a pandas DataFrame time series
>>> (t, signal, z) = generate_periodic_signal(10, 300, 0.5)
>>> sine_df = pd.DataFrame(z, index=t)
>>> # Instantiate KCA class with time series
>>> # (include standard deviation and confidence intervals)
>>> kca = kca.KCA(sine_df, include_std=True, sigma=2, seed=0.001)
>>> # Generate a signal (KCA position)
>>> position = kca.generate_signal()
>>> # Get KCA dataframe with all components
>>> kca_df = kca.dataframe
>>> # Forecast signal forecast 10 steps forward
>>> forecasted_signal = kca.generate_signal_forecast(num_to_forecast=10)
>>> forecasted_signal  
array([...

                        

Fast Fourier Transform (FFT)

Overview

FFT signal extraction for varying values of min_alpha

Fast Fourier Transform is an algorithm that transforms a signal from time-domain to frequency-domain. FFT is applied to functions the same way PCA is applied to vector spaces.

The authors caution of Fourier’s analysis ability to equally fit noise. Hence, the authors provide us with a mechanism of preventing overfitting noise by means.

To prevent this overfitting problem, the authors provide the following solution. At every iteration of the algorithm, they scan all unused frequencies looking for the one that delivers the greatest decrease on the Ljung-Box statistic. The algorithm stops when the probability associated with the Ljung-Box statistic exceeds threshold min_alpha, or could not reduce by said threshold, as further scanning is unwarranted.

Implementation

class FFT(time_series: DataFrame, min_alpha: float | None = 0.05)

Fast Fourier Transform (FTT), transforms a function of time to a function of frequency. It approximates general functions as linear combinations of periodic functions.

Our FFT class implementation selects the most relevant frequencies on a noisy signal that minimize the Ljung-Box statistic on the sample’s residuals and by consequence extract a signal.

generate_signal() → array

Generate signal by applying FFT fit.

Returns:: (np.array) Extracted signal.

get_critical_value() → Tuple[float, float]

Returns value of Ljung-Box statistic associated with extracting our generated signal from the FFT fit.

Returns:: (Tuple[float, float]) The first value of the Tuple is the Ljung-Box test statistic, the second element is the p-value based on chi-square distribution.

get_selected_frequencies() → array

Returns selected frequency values after generating signal.

Returns:: (np.array) Complex type array with with selected frequencies.

get_unused_frequencies() → dict

Returns frequencies that were not selected.

All frequencies that were selected for signal generation will not be listed.

Returns:: (dict) Dictionary with unused frequencies.

property min_alpha: float

Returns minimum alpha.

Returns:: (float) Probability value of obtaining statistical significance.

set_min_alpha(min_alpha: float)

Set minimum alpha.

Parameters:: min_alpha – (float) Set minimum alpha to min_alpha.

Example

                          >>> import pandas as pd
>>> from mlfinlab.features import fft
>>> from mlfinlab.util.generate_dataset import generate_periodic_signal
>>> # Obtain signal as a pandas DataFrame time series
>>> (t, signal, z) = generate_periodic_signal(10, 300, 0.5)
>>> sine_df = pd.DataFrame(z, index=t)
>>> # Instantiate KCA class with time series
>>> fft = fft.FFT(sine_df)
>>> # Generate a signal
>>> position = fft.generate_signal()
>>> position  
array([...

                        

Locally Weighted Scatterplot Smoothing (LOWESS)

Overview

LOWESS signal extraction for varying values of fraction

LOWESS fits weighted linear regressions to localized subsets of data in order to filter noise by point.

The parameter fraction indicates to the LOWESS algorithm the fraction of data to use for the fit. As can be seen on the above plot, the smaller the value of fraction, the better the fit (e.g. frac=0.1) to the original signal, at an added cost of stability the bumps in the generated signal.

Implementation

Tip

Our LOWESS implementation is a wrapper around the same algorithm provided by statsmodels. See implementation here.

class LOWESS(time_series: DataFrame, fraction: float | None = 0.5)

Locally Weighted Scatterplot Smoothing (LOWESS), is a non-parametric regression technique to obtain a smooth line through a scatterplot.

This class implementation applies LOWESS to a noisy time-series signal to obtain a smoother signal as a feature.

property fraction: float

Returns fraction of datapoints used for estimation by LOWESS.

Returns:: (float) Fraction of datapoints.

generate_signal() → array

Generate signal by applying a LOWESS fit.

Returns:: (np.array) Smooth signal.

set_fraction(fraction: float)

Set LOWESS fraction value.

Parameters:: fraction – (int) Fraction amount to use for estimations.

Example

                          >>> import pandas as pd
>>> from mlfinlab.features import lowess
>>> from mlfinlab.util.generate_dataset import generate_periodic_signal
>>> # Obtain signal as a pandas DataFrame time series
>>> (t, signal, z) = generate_periodic_signal(10, 300, 0.5)
>>> sine_df = pd.DataFrame(z, index=t)
>>> # Instantiate KCA class with time series
>>> lowess = lowess.LOWESS(sine_df)
>>> # Generate a signal
>>> position = lowess.generate_signal()
>>> position  
array([...

                        

Research Notebook

The following research notebook can be used to better understand noise reduction methods.

Noise Reduction

References

de Prado, M.L, Rebonato, R., 2016. Kinetic Component Analysis (KCA). Available at SSRN 2422183.