Friday, February 16, 2007

Why least squares?

Introductory courses in econometrics quickly tell students about the use of the least squares criterion in estimating regression equation parameters. The difference between an actual value of the dependent variable Y and its fitted value Yhat is called the residual. Least squares estimators are produced in such a way as to minimise the sum of the squares of these residuals (RSS = Residual Sum of Squares).

Most students will accept that the slope and intercept of a fitted regression line need to be found by some kind of objective method. It is not very reliable just to put down a ruler and draw in the line that seems to give a good fit balancing in some way positive and negative errors. But why minimise the sum of the squares? There are other possible objective criteria that could be used. For example why not just minimise the sum of the absolute deviations of the actual points from the fitted line? (It is easy enough to show that you couldn’t choose values to minimise the simple sum of the deviations because the positive and negative errors would just cancel each other out so you wouldn’t be able to get a solution this way).

Minimising the squared deviations will apply a greater penalty to points that lie further away from the fitted regression line than if we worked only with the absolute distances of the points from the line. It is sometimes argued that this is a desirable feature of the estimation technique. In some sense the residual that you get is an estimate of the disturbance for that observation so you might alternatively ask why a-typical observations with apparent big disturbances should be given extra importance. As Peter Kennedy points out in his book A Guide to Econometrics ( page 12), even if you use the squared deviations rather than the absolute deviations as a way of getting over the problem of positive and negative errors cancelling out you don’t have to give each of these squared deviations equal weight. Indeed as you will find out later in your studies there may be occasions where it is better to use Weighted Least Squares rather than Ordinary Least Squares as the criterion for determining your estimator.

Of course one advantage of least squares estimators is that they are computationally straightforward. The result of applying the criterion using basic methods of calculus is a simple formula for each regression coefficient in terms of the X and Y data (or more accurately their sums, sums of squares and sums of cross-products). Other estimation techniques might require iterative procedures to arrive at an estimate. Although that would be less of a worry in modern times given the advances in computer power, conceptually it seems attractive to have a formula that can always be used.

Least squares estimation might also seem to have a “natural” justification as W W Sawyer suggested in his wonderful little book The Search for Pattern (published by Penguin Books in 1970 as part of a series called Introducing Mathematics but now unfortunately out of print). In part of the chapter on algebra and statistics (pp312-313) he described a mechanical device that could provide a visual confirmation of the least squares criterion. The device consists of a solid piece of wood with nails or screws inserted at places supposed to correspond with the XY values on a graph. Sawyer suggested that a steel rod could be used for the line and the nails could be connected to the steel rod by elastic bands. To quote what he said “Things must be arranged in such a way that, if the rod actually passed through one of the points, that point’s band would ‘feel satisfied’ – there would be no tension in it. But the further away the rod is the greater the tension in the band must be; in fact the tension must be proportional to the amount by which the rod misses the point. Things must be so arranged that the bands are compelled to remain upright, as shown in the figure. Each band then is doing its best for the point to which it belongs, and under all these conflicting pulls the rod would eventually come to rest in a position which represented a fair compromise.”

I well remember constructing a device of this kind when with great excitement I first began teaching econometrics back in the 1970s. I even painted on the fitted least squares line on the wood so that students could see the rod settling exactly where the line was. Everything worked perfectly with the first group that I used it with but later in the week I guess the elastic bands must have weakened and, just after I had held the board aloft with the metal rod sitting perfectly in place, one of the elastic bands snapped and the rod was fired across the room narrowly missing one of the students. So my experiment with a physical representation of the least squares line was short-lived. Looking back at Sawyer’s book now I see that he does say that “the device may not be too easy to set up in actual practice”, a phrase that I must have missed in my enthusiasm at the time.

Another justification for using OLS that I remember from my own student days was that, despite its alleged limitations it was quite robust to departures from its underlying assumptions. Thus, while it might be better to make use of a more complex estimation technique to overcome problems of heteroskedasticity or autocorrelation, those techniques actually require you to have a good idea of the form of autocorrelation or heteroskedasticty that you were going to allow for – something that you might not have. I remember my tutor at the time, Farouk El Sheikh (sadly now no longer with us) setting us an essay question “’In the country of the blind the one eyed-man is king’. Discuss in relation to the use of OLS and other estimation techniques.” However he was somewhat taken aback by my answer, which pointed out that in H G Wells' short story, The Country of the Blind, the fully blind inhabitants of the remote South American country eventually decided that the one-eyed man was insane because of the visions that he kept talking about, and so decided that he must be operated on to make him ‘normal’.

Perhaps sometimes visual and literary allusions in econometrics can be taken too far!


4 Comments:

Blogger Will Dwinnell said...

The counter-argument, naturally, is that least absolute errors, not least errors is a more natural measure of performance, and that squaring exaggerates the importance of points which lie farther frmo the regression line.


-Will

11:41 PM  
Blogger Tyngewick Gawcott said...

Ah, but as in the elastic band example, the least squares position gives the point of minimum potential energy which is what nature generally tries to achieve.

HG Wells was, of course, quoting an existing aphorism and his story was being intentionally ironic.

7:20 AM  
Blogger Ian Maxwell said...

I realize this is an old post indeed to reply to, but here's a justification you may not know: if you assume the errors made in measurement are independent and normally distributed, we may ask for any possible line what is the likelihood of getting these measurements if that line were the correct model. The least-squares line is the line for which that probability is maximized. (This is not too hard to prove, either!)

11:13 PM  
Blogger Ian Maxwell said...

I realize this is an old post indeed to reply to, but here's a justification you may not know: if you assume the errors made in measurement are independent and normally distributed, we may ask for any possible line what is the likelihood of getting these measurements if that line were the correct model. The least-squares line is the line for which that probability is maximized. (This is not too hard to prove, either!)

11:13 PM  

Post a Comment

<< Home