Thursday, February 21, 2008

Why so many formulae?

Something that puzzles many students of statistics and econometrics is why there might be several different formulae available to calculate the same result.

To take a simple example Dougherty (2007) says on p63 that R squared is ESS/TSS (where ESS stands for the Explained Sum of Squares and TSS stands for the Total Sum of Squares). Then on the next page he calculates this by subtracting the ratio of the residual sum of squares (RSS) to TSS from 1. We can see that this is correct using a simple bit of algebra because TSS = ESS + RSS.

Then on page 115 he gives the formula for the F statistic (for testing the overall significance of a multiple regression model) as [(ESS/k-1)/(RSS/n-k)] where n is the number of sample observations and k is the number of parameters to be estimated in the model.

A few lines later he gives another formula for F as [(Rsquared/k-1)/(1 – R squared/n-k)]

Of course in both cases one formula is the definition and the other is a simple algebraic transformation that might, in certain circumstances, be more convenient to work with.

In each of these cases it is not difficult to see that the second version follows from the first. If we can break the Total Sum of Squares into two additive parts ESS+RSS, then R squared = ESS/TSS = (TSS-RSS)/TSS = 1 –(RSS/TSS). For the F statistic if you divide top and bottom of the formula by TSS you leave it unchanged overall. So it follows that you can replace ESS on the top by R squared and RSS on the bottom by 1-R squared.

It might even be argued that working with these different formulae can actually give you a greater understanding of the meaning and interpretation of the various concepts.


Dougherty, C. (2007) Introduction to Econometrics. Third Edition. Oxford University Press, Oxford