Polynomial Equations, Noise & Extrapolation |
|
Pitfall Information
Sample polynomial equation:
Pros:
Able to fit a wide variety of data sets
Numerically easy to fit, as it uses linear least squares procedures
Cons:
Extrapolation can be very risky
Tendency to fit noise, especially with higher order equations
Unable to fit straight lines
How to identify problem:
Graphical inspection
Extrapolation
Generally speaking, polynomial equations actually do a very good job of fitting
a variety of data set profiles. One major problem lies in the area of extrapolation.
This is the process of using your curve fit model to predict data values beyond
the X data range provided for the curve fit.
The main problem with using polynomials for extrapolation lies in the fact they
have a strong tendency to change direction once they leave the X data range. Here
are two examples of this below:
Example 1: Fitted curve fit goes up abruptly
Example 2: Fitted curve fit levels off slightly, and then goes down
If you wish to perform extrapolation, it's strongly recommended to inspect the
area of interest before doing so. In both of these examples, performing any extrapolation,
because neither curve fit really follows the data set profile outside of the X
data range.
You could fix the extrapolation problem by using a different model, but the better
solution is to obtain data for the X range you were originally trying to extrapolate
to, and then perform the curve fit again.
Fitting Noise
The worst characteristic of polynomial equations is their tendency to fit noise
within a data set. If your data doesn't have much noise or outliers, then they
can generally be used safely. Otherwise, you will obtain a very strange curve
fit.
In this example, we were attempting to fit a 7th order polynomial equation to
a slightly noisy data set has a quadratic equation profile. Notice that the resulting
curve fit isnt following the data trend at all because the noise is being
fitted instead. You are essentially ending up with a connect the dots
curve fit.
So you might think - what's the problem? Well, there are three major problems
with this model. We will focus on three sections of the graph:
Problem #1: Extrapolation around the first data point
Notice that the curve fit comes down from infinity, goes through the first data
point, overshoots it, and then travels back up. This behavior makes any extrapolation
before or even near the first data point unreliable.
Problem #2: Fitting noise in the middle of the data set
The data points in the middle could be fitted much better. The peak is shifted
over too far to the right, which would also make interpolation unreliable. This
is being cause by the small amount of noise.
Problem #3: Strange behavior at end of data set
The last three data points are basically forming a straight line. Notice that
the curve fit squiggles out before reaching the last data point.
There's another bonus pitfall in this curve fit. Did you notice it?
Hint: Look at the information at the top of the graph, and it doesn't deal with
numbers.
Here's a much better curve fit, which uses a quadratic equation:
This is a much better curve fit. Notice that the r-square is lower than the previous
fit, but there are two things to notice:
1) The curve fit clearly follows the data trend, which will give more realistic
values when performing interpolation and/or extrapolation.
2) Only a minimal amount of noise is being fitted.
The bonus pitfall - the previous curve fit used eight parameters for only nine
data points! By contrast, this model only used three and resulted in a better
fit. This will be elaborated in the section Redundant
Parameters.
Unable
To Fit Straight Lines
Polynomial equations are not a good choice when you are attempting to fit a data
set that had a profile that combines curves and straight lines. When a polynomial
is fitted to a straight line, it will become unstable and you will get a "Sine
wave" effect.
This is an example, using a 10th order polynomial to curve fit a data set, which
is in what's called a "Sigmoid" profile.
Notice how this 10th order polynomial
fits the middle portion of the data set, but becomes unstable at the ends of
the X data range, where it's a straight line. Polynomial equations will create
their characteristic sine wave pattern when they are fitted to straight
lines.