During my time in Technical Support,
some people would ask me how I was evaluating their curve fit. Although I was
able to sort of describe it, I finally got around to writing the steps down.
When evaluating a curve fit, it basically boils down to four steps, which will
help you better identify any potential problems:
Step 1 - Visually examine how well the equation fits the data
For this step, ask yourself the following questions:
- Does the equation appear
to fit a lot of noise?
If the equation is fitting noise, then it's definitely not going to work
out for you. You could use a robust minimization method, another model, or
get more samples. There's no really simple answer to this.
- Is the equation reasonably
following the data trend?
Think back to the previous example with the quadratic
polynomial.
- Are there any unstable
or undefined regions within the XY data range? Before the first data point?
After the last data point?
Unstable regions are a warning that this curve fit may be invalid. When undefined
regions appear, most noticeably with confidence and/or prediction intervals,
it could indicate that any interpolation/extrapolation around those areas
would be best avoided. Another issue to consider is that your curve fit may
be invalid in that range altogether. You can use a logarithmic graph scaling
to aid in finding unstable and undefined regions.
- Is the curve fit at a local
minimum?
Think back to the examples on local minimum traps.
Step 2 - Display the confidence and prediction intervals, along
with the original data and curve fit
For this step, have the confidence and prediction intervals graphed along with
the the original data and curve fit. Ask yourself the following questions:
- Are there any unstable and/or
undefined regions?
As with step 1 above, unstable and/or undefined regions can indicate problems.
- Do the confidence and/or
prediction intervals follow the curve fit line very closely, or are they very
distant?
The closer the intervals follow the curve fit, the better.
Step 3 - Examine the "t-statistic", standard error and
the confidence intervals for each parameter
All curve fitting programs display a numerical summary of the results. This is
an example:
Standard Error:
The estimated standard errors are always underestimates because they cant
be calculated exactly, unlike a linear least squares procedure. Large errors can
mean theres too much noise, redundant parameters,
or parameters are statistically dependent.
t-value:
A large number is ideal - positive or negative.
Confidence/Prediction Intervals:
Small intervals are ideal. If the intervals are abnormally large, its very
likely that the parameter(s) in question can be removed from the model.
P-value:
A small number is ideal. If the number is abnormally large, then the parameter
in question isn't statistically significant in the model and can be removed.
Step 4 - Examine the Analysis of Variance (ANOVA) table
Here's an example of an ANOVA table, which is from the quadratic polynomial fit
discussed in the Polynomials pitfall. This is an
excellent curve fit, which is shown in this table:
Here are a couple of items in the ANOVA table to think about:
MSR - The Mean Square Regression value. This should be large, and is influenced
by the number of parameters.
MSE - The Mean Square Error value This should be small, and is influenced by both
the number of parameters and data points.
F-Statistic - As the MSE gets smaller, this value gets larger. A large value is
good - notice the 175.208 value!
P-Value - The smaller, the better.