Next: The Art of Phugoid Up: If the Curve Fits, Previous: Robust fitting

A nod toward statistics

In the problem we have treated so far, there have been a distinction made between the x and y coordinate of a data point (x_i, y_i): the x coordinate is thought as the predictor variable, while the y coordinate is the predicted value. The distinction is significant. If, for example, we think of x as a function of y, the best line that fits the data using the method of least square is, in general, different than the corresponding line when thinking of y as a function of x. Indeed, if we use the same data as in §3, the best line x = ny + c that fits it is given by x = 0.6677953348y - 0.738807374. Solving in terms of y yields y = 1.497464789x + 1.106338028; this differs significantly from the line y = 1.431451613x + 1.271370968 found in §3.

In this case, we use as error the square of the horizontal distance, and it is somewhat perturbing that this similarly reasonable approach leads to a best fit that differs from the one obtained when employing vertical distances.

The situation can be made even worse if neither x nor y is a predictor variable. In this case, we want to minimize the shortest distance to a line y = mx + b rather than the vertical (or horizontal) distance. The resulting equations will not be linear, nor can they be made linear. However, Maple will be able to find the critical points with no trouble. There will always be at least two (there may be a third with a huge slope and intercept)- one is the minimum, and the other is a saddle point. It is worth thinking for a few minutes about the geometric interpretation of the saddle point in terms of the problem at hand.

In practice, the perturbing fact that different but reasonable error functions lead to best fits that are different is resolved by knowing what part of the data is the input and what part is predicted from it. This happens often enough, though not always. We thus can make a good choice for E and solve a minimization problem. That settled, we are still left with the problem of demonstrating why our choice for E is a good one.

It is rather easy to write down the solution to the problem in §3: if

$\displaystyle \bar{x}$ = $\displaystyle {\frac{1}{n}}$ $\displaystyle \sum_{i=1}^{n}$ x_i , $\displaystyle \bar{y}$ = $\displaystyle {\frac{1}{n}}$ $\displaystyle \sum_{i=1}^{n}$ y_i ,

then

$\displaystyle \hat{m}$ = $\displaystyle {\frac{n \sum_{i=1}^n x_i y_i - \left(\sum_{j=1}^n x_j \right) \l... ...j=1}^n y_j \right)}{n \sum_{i=1}^n x_i x_i + \left(\sum_{j=1}^n x_j \right)^2}}$ = $\displaystyle {\frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2}}$ , $\displaystyle \hat{b}$ = $\displaystyle \bar{y}$ - m $\displaystyle \bar{x}$ .

If y_i are assumed to be the values of random variables Y_i which depend linearly upon the x_i,

Y_i = mx_i + b + $\displaystyle \varepsilon_{i}^{}$ ,

with errors $\varepsilon_{i}^{}$ that are independent from each other, are zero on average and have some fixed variance, then the value of m given above is the best estimator of the slope among all those linear estimators that are unbiased. ``Best'' here is measured by calculating the deviation from the mean, and this best estimator is the one that produces the smallest such deviation.

This conclusion follows by merely making assumptions about the inner products of the data points (x₁,..., x_n) and (y₁,..., y_n). Staticians often would like to answer questions such as the degree of accuracy of the estimated value of m and b. For that one would have to assume more about the probability distribution of the error variables Y_i. A typical situation is to assume that the $\varepsilon_{i}^{}$ above are normally distrubuted, with mean 0 and variance $\sigma^{2}_{}$ . Under these assumptions, the values of m and b given above are the so-called maximum likelihood estimators for these two parameters, and there is yet one such estimator for the variance $\sigma^{2}_{}$ . But, since we assumed more, we can also say more. The estimators $\hat{m}$ and $\hat{b}$ are normally distributed and, for example, the mean of $\hat{m}$ is m and its variance is $\sigma^{2}_{}$ / $\sum_{i=1}^{n}$ (x_i - $\bar{x}$ )². With this knowledge, one may embark into determining the confidence we could have on our estimated value for the parameters. We do not do so in here, but want to plant the idea in the interested reader, whom we refer to books on the subject.

Next: The Art of Phugoid Up: If the Curve Fits, Previous: Robust fitting

Translated from LaTeX by Scott Sutherland
2002-08-29