Evaluating a Curve Fit

During my time in Technical Support, some people would ask me how I was evaluating their curve fit. Although I was able to sort of describe it, I finally got around to writing the steps down.

When evaluating a curve fit, it basically boils down to four steps, which will help you better identify any potential problems:


Step 1 - Visually examine how well the equation fits the data

For this step, ask yourself the following questions:

Step 2 - Display the confidence and prediction intervals, along with the original data and curve fit

For this step, have the confidence and prediction intervals graphed along with the the original data and curve fit. Ask yourself the following questions:

Step 3 - Examine the "t-statistic", standard error and the confidence intervals for each parameter

All curve fitting programs display a numerical summary of the results. This is an example:



Standard Error:
The estimated standard errors are always underestimates because they can’t be calculated exactly, unlike a linear least squares procedure. Large errors can mean there’s too much noise, redundant parameters, or parameters are statistically dependent.

t-value:
A large number is ideal - positive or negative.

Confidence/Prediction Intervals:
Small intervals are ideal. If the intervals are abnormally large, it’s very likely that the parameter(s) in question can be removed from the model.

P-value:
A small number is ideal. If the number is abnormally large, then the parameter in question isn't statistically significant in the model and can be removed.


Step 4 - Examine the Analysis of Variance (ANOVA) table

Here's an example of an ANOVA table, which is from the quadratic polynomial fit discussed in the Polynomials pitfall. This is an excellent curve fit, which is shown in this table:




Here are a couple of items in the ANOVA table to think about:

MSR - The Mean Square Regression value. This should be large, and is influenced by the number of parameters.
MSE - The Mean Square Error value This should be small, and is influenced by both the number of parameters and data points.
F-Statistic - As the MSE gets smaller, this value gets larger. A large value is good - notice the 175.208 value!
P-Value - The smaller, the better.