Evaluating a curve fit

Evaluating a Curve Fit

During my time in Technical Support, some people would ask me how I was evaluating their curve fit. Although I was able to sort of describe it, I finally got around to writing the steps down.

When evaluating a curve fit, it basically boils down to four steps, which will help you better identify any potential problems:

Step 1 - Visually examine how well the equation fits the data

For this step, ask yourself the following questions:

Does the equation appear to fit a lot of noise?
If the equation is fitting noise, then it's definitely not going to work out for you. You could use a robust minimization method, another model, or get more samples. There's no really simple answer to this.
Is the equation reasonably following the data trend?
Think back to the previous example with the quadratic polynomial.
Are there any unstable or undefined regions within the XY data range? Before the first data point? After the last data point?
Unstable regions are a warning that this curve fit may be invalid. When undefined regions appear, most noticeably with confidence and/or prediction intervals, it could indicate that any interpolation/extrapolation around those areas would be best avoided. Another issue to consider is that your curve fit may be invalid in that range altogether. You can use a logarithmic graph scaling to aid in finding unstable and undefined regions.
Is the curve fit at a local minimum?
Think back to the examples on local minimum traps.

Step 2 - Display the confidence and prediction intervals, along with the original data and curve fit

For this step, have the confidence and prediction intervals graphed along with the the original data and curve fit. Ask yourself the following questions:

Are there any unstable and/or undefined regions?
As with step 1 above, unstable and/or undefined regions can indicate problems.
Do the confidence and/or prediction intervals follow the curve fit line very closely, or are they very distant?
The closer the intervals follow the curve fit, the better.

Step 3 - Examine the "t-statistic", standard error and the confidence intervals for each parameter

All curve fitting programs display a numerical summary of the results. This is an example:

Standard Error:
The estimated standard errors are always underestimates because they can’t be calculated exactly, unlike a linear least squares procedure. Large errors can mean there’s too much noise, redundant parameters, or parameters are statistically dependent.

t-value:
A large number is ideal - positive or negative.

Confidence/Prediction Intervals:
Small intervals are ideal. If the intervals are abnormally large, it’s very likely that the parameter(s) in question can be removed from the model.

P-value:
A small number is ideal. If the number is abnormally large, then the parameter in question isn't statistically significant in the model and can be removed.

Step 4 - Examine the Analysis of Variance (ANOVA) table

Here's an example of an ANOVA table, which is from the quadratic polynomial fit discussed in the Polynomials pitfall. This is an excellent curve fit, which is shown in this table:

Here are a couple of items in the ANOVA table to think about:

MSR - The Mean Square Regression value. This should be large, and is influenced by the number of parameters.
MSE - The Mean Square Error value This should be small, and is influenced by both the number of parameters and data points.
F-Statistic - As the MSE gets smaller, this value gets larger. A large value is good - notice the 175.208 value!
P-Value - The smaller, the better.