ABSTRACT

Determining the "best fit" line to a series of data involves minimizing the collective distances between the points and that line, as illustrated in Figure 4.4. To eliminate the canceling effects of some of these error distances ei being positive, while others are negative, they are individually squared and the equation of a line is sought with the minimum sum of the squares of these error distances. It should be noted that while the yvalue of a particular point as predicted by the line will miss the point by some error distance, the x-value is simply assigned to be the same as the actual datum and has no error. Thus, if y~ represents the line's predicted value of the ith point in the

y~ = mxi + b where m is the slope, and b is the y-intercept of the line. Thus, the sum of the squared error can be expressed:

where N is the total number of data pairs. The minimum of this function represents the condition of best fit, so derivatives of it are taken with respect to the coefficients m and b and each set equal to zero (since the optimum choice of these coefficients will define the best fit line):

Rearranging:

Hence:

In the previous example (Figure 4.4), the values of the sums would be tallied up for each set of five points, and the value of slope calculated from the above equation will be assigned to the center point. The choice of five points is arbitrary, but the number of points should be odd so that the number of points about the center point will be symmetrical. The following computer program determines the coefficient of expansion

(temperature derivative of expansion, see section 7) of a expansion/temperature data set using this technique, where the user selects the number of points for slope determination:

'This is a Basic (Microsoft Quickbasic 4.5) program for taking 'derivatives of experimental data.