AutoPWLF Documentation

Piecewise Linear Fit with automated selection of the number of segments.

Installation

$ pip install -U autopwlf

Examples

Default usage involves calling auto_fit() method with the x and y data. fastfit is set to True by default.

import numpy as np
import autopwlf

# Initialize variables
x = np.arange(0, 150)

y = np.array([
        0.1,  0.4, -0.2,  0.7,  1.7,  1.6,  3.1,  1.9,  4.4,  5.3,  3.3,
        5.6,  3.7,  5.8,  6. ,  5.1,  5.9,  5.7, 10.3,  8. ,  7.5,  8.7,
        7.9,  9.2, 11. , 15.2, 18.1, 28.4, 29.3, 29.1, 30.4, 38.4, 37.1,
        40.5, 42.1, 47.4, 49.1, 50.2, 51.9, 52.5, 53.3, 54.2,
        54.1, 58.3, 59.1, 60.2, 54.9, 55.1, 49.3, 48.1, 47.4, 44.1,
        43.7, 42.1, 41.5, 40.3, 39.1, 38.9, 37.7, 36.1, 35.5, 34.3,
        42.1, 36.8, 33.3, 30.1, 28.9, 27.8, 26.9, 26.1, 25.3, 24.6,
        27. , 25.7, 25.7, 24.8, 23.9, 23.4, 22.5, 21.6, 20.7, 19.8, 18.9,
        16.1, 15.4, 14.8, 13.9, 14.9, 12.3, 11.4, 10.7,  9.8,  8.9,  8.1,
        15.3, 18.0, 14.5 , 11.3,  9.4, 10.5, 9.8, 6.8, 6.4, 5.4, 2.9,
        3.8, 2.4, 1.1, 2.5, 3.9, 2.7, 3.7, 4.3, 6.3, 4.9, 6.5 ,
        7.7, 6.4 , 7.8, 7.9, 8.7, 8.3, 10.5, 19.7, 13.5, 19.7, 19.6,
        28.5, 38. , 39. , 36.6, 38.2, 38.3, 37. , 36.2, 37. , 35.3, 34.9,
        33.8, 34.7, 33.1, 33.1, 32.3, 33.9, 30.9, 31.2, 31.7, 30.8, 30.2,
        30.3
        ])

apwlf = autopwlf.AutoPWLF(x, y)
model_fit = apwlf.auto_fit() # default complexity penlty is 20
_images/example_default_params.png

Lowering complexity penalty will often result in more segments.

apwlf = autopwlf.AutoPWLF(x, y)
model_fit = apwlf.auto_fit(complexity=5)
_images/example_low_complex_penalty.png

For better model perfomance (at the cost of speed), set fastfit=False. Recommended to use buffer of 0 or 1 to prevent large runtime.

my_pwlf = apwlf.auto_fit(fastfit=False, buffer=1)

If an upper and lower limit is known for the number of segments, use the following:

apwlf = autopwlf.AutoPWLF(x, y)
model_fit = apwlf.fit(x, y, min_breaks=2, max_breaks=5)

If outliers are present in the data, they can be detected and ignored in the fitting, using the following:

x = np.arange(0, 152)
y = np.array([
    0.1,  0.4, -0.2,  0.7,  1.7,  1.6,  3.1,  1.9,  4.4,  5.3,
    3.3,  5.6,  3.7,  5.8,  6.0,  5.1,  5.9,  5.7, 10.3,  8.0,
    7.5,  8.7,  7.9,  9.2, 11.0, 15.2, 18.1, 28.4, 29.3, 29.1,
    30.4, 38.4, 37.1, 40.5, 42.1, 47.4,  0.2, 49.1, 50.2, 51.9,
    52.5, 53.3, 54.2, 54.1, 58.3, 59.1, 60.2, 54.9, 55.1, 49.3,
    48.1, 47.4, 44.1, 43.7, 42.1, 41.5, 40.3, 39.1, 38.9, 37.7,
    36.1, 35.5, 34.3, 42.1, 36.8, 33.3, 30.1, 28.9, 27.8, 26.9,
    26.1, 25.3, 24.6, 27.0, 25.7, 25.7, 24.8, 23.9, 23.4, 22.5,
    21.6, 20.7, 19.8, 18.9, 16.1, 15.4, 14.8, 13.9, 14.9, 12.3,
    11.4, 10.7,  9.8,  8.9,  8.1, 15.3, 18.0, 14.5, 11.3,  9.4,
    10.5,  9.8,  6.8,  6.4,  5.4,  2.9,  3.8,  2.4,  1.1, 65.0,
    2.5,  3.9,  2.7,  3.7,  4.3,  6.3,  4.9,  6.5,  7.7,  6.4,
    7.8,  7.9,  8.7,  8.3, 10.5, 19.7, 13.5, 19.7, 19.6, 28.5,
    38.0, 39.0, 36.6, 38.2, 38.3, 37.0, 36.2, 37.0, 35.3, 34.9,
    33.8, 34.7, 33.1, 33.1, 32.3, 33.9, 30.9, 31.2, 31.7, 30.8,
    30.2, 30.3
    ])

apwlf = autopwlf.AutoPWLF(x, y)
model_fit = apwlf.auto_fit(outliers=True, outlier_threshold=4)
_images/example_with_outliers.png

Contents

class autopwlf.AutoPWLF(x_data: array, y_data: array, disp_res: bool = False, lapack_driver: str = 'gelsd', degree: int = 1, weights: Optional[array] = None, random_seed: Optional[int] = None, smooth_polyorder: int = 0, peak_threshold: float = 0.25, prominence_threshold: float = 0.1)[source]

Bases: object

Class to find the optimal number of breaks for a piecewise linear fit using the Bayesian Information Criterion (BIC) and then fit the piecewise linear function to the given data set

__init__(x_data: array, y_data: array, disp_res: bool = False, lapack_driver: str = 'gelsd', degree: int = 1, weights: Optional[array] = None, random_seed: Optional[int] = None, smooth_polyorder: int = 0, peak_threshold: float = 0.25, prominence_threshold: float = 0.1)[source]

Initialize the AutoPWLF class to find the optimal number of breaks and fit a continuous piecewise linear function. Supply x and y values which will be used to fit the piecewise linear function to where y(x) = f(x).

Parameters:
  • x_data – independent variable

  • y_data – dependent variable

  • disp_res – display results

  • lapack_driver – LAPACK driver to use for the linear least squares problem

  • degree – degree of the polynomial to fit

  • weights – weights for the least squares problem

  • random_seed – random seed for reproducibility

  • smooth_polyorder – polynomial order for the Savitzky-Golay filter

  • peak_threshold – threshold for identifying peaks and valleys

  • prominence_threshold – threshold for identifying significant peaks and valleys

stationary_points

number of stationary points in the data set

y_smooth

smoothed y values

optimal_breaks

optimal number of breaks

outliers

array containing the outliers

initial_outlier_model

initial piecewise linear model used for the outlier detection

outlier_detection_pred

predicted values without the outliers

outlier_detection_residuals

residuals without the outliers

auto_fit(fitfast: bool = True, buffer: int = 2, complexity_penalty=20, outliers: bool = False, outlier_threshold: int = 5) PiecewiseLinFit[source]

Fit a piecewise linear function with automated number of breaks found from the stationary points Adding a buffer to the number of breaks to allow for more flexibility with the model chosen by the BIC

Parameters:
  • fitfast – if true, use the fast fitting method for the piecewise linear fit

  • buffer – buffer for the number of breaks to allow for more flexibility

  • complexity_penalty – complexity penalty for the BIC

  • outliers – if true, remove outliers from the data

  • outlier_threshold – threshold for identifying outliers: standard deviations from the mean residual

Returns:

PiecewiseLinFit model

Return type:

pwlf.PiecewiseLinFit

find_num_stationary_points() int[source]

Find the number of stationary points in the data set to use as min and max breaks First fit a smoothened interpolation function on the data Then find the number of peaks and valleys in the data

Returns:

Number of stationary points

Return type:

int

find_outliers(pwlf_model: PiecewiseLinFit, outlier_threshold: int)[source]

This function adjusts the outliers by fitting a new piecewise linear model without the outliers

Parameters:
  • pwlf_model – PiecewiseLinFit model

  • outlier_threshold – threshold for identifying outliers: standard deviations from the mean residual

Returns:

array containing the outliers

Return type:

outliers

fit(x: array, y: array, min_breaks: int, max_breaks: Optional[int] = None, complexity_penalty: int = 20, fitfast: bool = True) tuple[source]

Finds the optimal number of breaks for a piecewise linear fit using the Bayesian Information Criterion (BIC) and then plots the piecewise linear fit of the given data set

Parameters:
  • x – Independent variable.

  • y – Dependent variable.

  • min_breaks – minimum number of breaks

  • max_breaks – maximum number of breaks

  • complexity_penalty – complexity penalty for the BIC

  • fitfast – if true, use the fast fitting method for the piecewise linear fit

Returns:

Optimal number of breaks, PiecewiseLinFit model

Return type:

tuple