Now we relax the restriction that the function we are searching for must pass exactly through each of the data points. This is the typical situation in science, where we make a measurement or do an experiment to gather our y values from the input x values.
More specifically, let's assume that y = fc1,..., ck(x1,..., xm) is a real-valued function of (x1,..., xm) which depends upon k parameters c1,..., ck. These parameters are unknown to us. However, suppose that we can perform repeated experiments that for given values of (x1,..., xm) allows us to measure output values for y. How can we estimate the parameters c1,..., ck that best correspond with this information?
Let us assume that the experiment measuring the value y = fc(x) for specific input values x = (x1,..., xm) is repeated n times. We will then obtain a system of equations
fc1,..., ck(x11, x12, ... , x1m) | = | y1 |
fc1,..., ck(x21, x22, ... , x2m) | = | y2 |
fc1,..., ck(xn1, xn2, ... , xnm) | = | yn |
Since we may perform the experiments as many times as we wish, we may end up with more relations of the type above than unknowns (that is, n is larger than k). The larger the n, the more information we have collected about the coefficients. However, even if the experiments are carried out with great care, they unavoidably will contain some error. The question remains: how may we estimate judiciously the coefficients c1,..., ck using the collected information about y = fc(x)? What is the best fit?
A very common method to respond to this question is known as the method of least-squares. The idea is simple: per realization of the experiment, we measure the fitting error by the distance from the real number fc(x1, x2, ... , xm) and the observed value of y. The best fit for the distance though will also lead to a best fit for the square of the distance. To avoid absolute values, we change our viewpoint slightly and measure the fitting error by (fc1,..., ck(x1, x2,..., xm) - y)2. The error function that considers all the information obtained from the n experiments is then
This turns out to be a function of c = (c1,..., ck). Mathematically, our best fit problem is now reduced to finding the value of c which produces a minimum for this error function E. The details of how this can be done depends intrinsically upon the assumed form of the function f, and its relation to the parameters c1,..., ck.
We should remark that in some cases, we might want to use E(c1,..., ck)/n, the mean-squared error. This will not change the answer we get (because the minimum occurs at the same value of c), but does allow us to compare sets of data with different numbers of data points.