AutoPWLF Documentation
Piecewise Linear Fit with automated selection of the number of segments.
Installation
$ pip install -U autopwlf
Examples
Default usage involves calling auto_fit() method with the x and y data. fastfit is set to True by default.
import numpy as np
import autopwlf
# Initialize variables
x = np.arange(0, 150)
y = np.array([
0.1, 0.4, -0.2, 0.7, 1.7, 1.6, 3.1, 1.9, 4.4, 5.3, 3.3,
5.6, 3.7, 5.8, 6. , 5.1, 5.9, 5.7, 10.3, 8. , 7.5, 8.7,
7.9, 9.2, 11. , 15.2, 18.1, 28.4, 29.3, 29.1, 30.4, 38.4, 37.1,
40.5, 42.1, 47.4, 49.1, 50.2, 51.9, 52.5, 53.3, 54.2,
54.1, 58.3, 59.1, 60.2, 54.9, 55.1, 49.3, 48.1, 47.4, 44.1,
43.7, 42.1, 41.5, 40.3, 39.1, 38.9, 37.7, 36.1, 35.5, 34.3,
42.1, 36.8, 33.3, 30.1, 28.9, 27.8, 26.9, 26.1, 25.3, 24.6,
27. , 25.7, 25.7, 24.8, 23.9, 23.4, 22.5, 21.6, 20.7, 19.8, 18.9,
16.1, 15.4, 14.8, 13.9, 14.9, 12.3, 11.4, 10.7, 9.8, 8.9, 8.1,
15.3, 18.0, 14.5 , 11.3, 9.4, 10.5, 9.8, 6.8, 6.4, 5.4, 2.9,
3.8, 2.4, 1.1, 2.5, 3.9, 2.7, 3.7, 4.3, 6.3, 4.9, 6.5 ,
7.7, 6.4 , 7.8, 7.9, 8.7, 8.3, 10.5, 19.7, 13.5, 19.7, 19.6,
28.5, 38. , 39. , 36.6, 38.2, 38.3, 37. , 36.2, 37. , 35.3, 34.9,
33.8, 34.7, 33.1, 33.1, 32.3, 33.9, 30.9, 31.2, 31.7, 30.8, 30.2,
30.3
])
apwlf = autopwlf.AutoPWLF(x, y)
model_fit = apwlf.auto_fit() # default complexity penlty is 20
Lowering complexity penalty will often result in more segments.
apwlf = autopwlf.AutoPWLF(x, y)
model_fit = apwlf.auto_fit(complexity=5)
For better model perfomance (at the cost of speed), set fastfit=False. Recommended to use buffer of 0 or 1 to prevent large runtime.
my_pwlf = apwlf.auto_fit(fastfit=False, buffer=1)
If an upper and lower limit is known for the number of segments, use the following:
apwlf = autopwlf.AutoPWLF(x, y)
model_fit = apwlf.fit(x, y, min_breaks=2, max_breaks=5)
If outliers are present in the data, they can be detected and ignored in the fitting, using the following:
x = np.arange(0, 152)
y = np.array([
0.1, 0.4, -0.2, 0.7, 1.7, 1.6, 3.1, 1.9, 4.4, 5.3,
3.3, 5.6, 3.7, 5.8, 6.0, 5.1, 5.9, 5.7, 10.3, 8.0,
7.5, 8.7, 7.9, 9.2, 11.0, 15.2, 18.1, 28.4, 29.3, 29.1,
30.4, 38.4, 37.1, 40.5, 42.1, 47.4, 0.2, 49.1, 50.2, 51.9,
52.5, 53.3, 54.2, 54.1, 58.3, 59.1, 60.2, 54.9, 55.1, 49.3,
48.1, 47.4, 44.1, 43.7, 42.1, 41.5, 40.3, 39.1, 38.9, 37.7,
36.1, 35.5, 34.3, 42.1, 36.8, 33.3, 30.1, 28.9, 27.8, 26.9,
26.1, 25.3, 24.6, 27.0, 25.7, 25.7, 24.8, 23.9, 23.4, 22.5,
21.6, 20.7, 19.8, 18.9, 16.1, 15.4, 14.8, 13.9, 14.9, 12.3,
11.4, 10.7, 9.8, 8.9, 8.1, 15.3, 18.0, 14.5, 11.3, 9.4,
10.5, 9.8, 6.8, 6.4, 5.4, 2.9, 3.8, 2.4, 1.1, 65.0,
2.5, 3.9, 2.7, 3.7, 4.3, 6.3, 4.9, 6.5, 7.7, 6.4,
7.8, 7.9, 8.7, 8.3, 10.5, 19.7, 13.5, 19.7, 19.6, 28.5,
38.0, 39.0, 36.6, 38.2, 38.3, 37.0, 36.2, 37.0, 35.3, 34.9,
33.8, 34.7, 33.1, 33.1, 32.3, 33.9, 30.9, 31.2, 31.7, 30.8,
30.2, 30.3
])
apwlf = autopwlf.AutoPWLF(x, y)
model_fit = apwlf.auto_fit(outliers=True, outlier_threshold=4)
Contents
- class autopwlf.AutoPWLF(x_data: array, y_data: array, disp_res: bool = False, lapack_driver: str = 'gelsd', degree: int = 1, weights: Optional[array] = None, random_seed: Optional[int] = None, smooth_polyorder: int = 0, peak_threshold: float = 0.25, prominence_threshold: float = 0.1)[source]
Bases:
objectClass to find the optimal number of breaks for a piecewise linear fit using the Bayesian Information Criterion (BIC) and then fit the piecewise linear function to the given data set
- __init__(x_data: array, y_data: array, disp_res: bool = False, lapack_driver: str = 'gelsd', degree: int = 1, weights: Optional[array] = None, random_seed: Optional[int] = None, smooth_polyorder: int = 0, peak_threshold: float = 0.25, prominence_threshold: float = 0.1)[source]
Initialize the AutoPWLF class to find the optimal number of breaks and fit a continuous piecewise linear function. Supply x and y values which will be used to fit the piecewise linear function to where y(x) = f(x).
- Parameters:
x_data – independent variable
y_data – dependent variable
disp_res – display results
lapack_driver – LAPACK driver to use for the linear least squares problem
degree – degree of the polynomial to fit
weights – weights for the least squares problem
random_seed – random seed for reproducibility
smooth_polyorder – polynomial order for the Savitzky-Golay filter
peak_threshold – threshold for identifying peaks and valleys
prominence_threshold – threshold for identifying significant peaks and valleys
- stationary_points
number of stationary points in the data set
- y_smooth
smoothed y values
- optimal_breaks
optimal number of breaks
- outliers
array containing the outliers
- initial_outlier_model
initial piecewise linear model used for the outlier detection
- outlier_detection_pred
predicted values without the outliers
- outlier_detection_residuals
residuals without the outliers
- auto_fit(fitfast: bool = True, buffer: int = 2, complexity_penalty=20, outliers: bool = False, outlier_threshold: int = 5) PiecewiseLinFit[source]
Fit a piecewise linear function with automated number of breaks found from the stationary points Adding a buffer to the number of breaks to allow for more flexibility with the model chosen by the BIC
- Parameters:
fitfast – if true, use the fast fitting method for the piecewise linear fit
buffer – buffer for the number of breaks to allow for more flexibility
complexity_penalty – complexity penalty for the BIC
outliers – if true, remove outliers from the data
outlier_threshold – threshold for identifying outliers: standard deviations from the mean residual
- Returns:
PiecewiseLinFit model
- Return type:
pwlf.PiecewiseLinFit
- find_num_stationary_points() int[source]
Find the number of stationary points in the data set to use as min and max breaks First fit a smoothened interpolation function on the data Then find the number of peaks and valleys in the data
- Returns:
Number of stationary points
- Return type:
- find_outliers(pwlf_model: PiecewiseLinFit, outlier_threshold: int)[source]
This function adjusts the outliers by fitting a new piecewise linear model without the outliers
- Parameters:
pwlf_model – PiecewiseLinFit model
outlier_threshold – threshold for identifying outliers: standard deviations from the mean residual
- Returns:
array containing the outliers
- Return type:
outliers
- fit(x: array, y: array, min_breaks: int, max_breaks: Optional[int] = None, complexity_penalty: int = 20, fitfast: bool = True) tuple[source]
Finds the optimal number of breaks for a piecewise linear fit using the Bayesian Information Criterion (BIC) and then plots the piecewise linear fit of the given data set
- Parameters:
x – Independent variable.
y – Dependent variable.
min_breaks – minimum number of breaks
max_breaks – maximum number of breaks
complexity_penalty – complexity penalty for the BIC
fitfast – if true, use the fast fitting method for the piecewise linear fit
- Returns:
Optimal number of breaks, PiecewiseLinFit model
- Return type: