So far in the book we’ve seen two measures of closeness: MSE (Mean Squared Error) and RSS (Residual Sum of Squares). How do they differ, exactly? Why do we need to use two different measures?
To recap, here’s what MSE looks like:
So given n observations, it sums up the squared difference between y and the estimate of y for each observation, and average it over n. That is, it’s the expected value of the squared distance.
Whereas RSS is just the sum, without the division by n. It might sound trivial, but actually MSE and RSS are different — if you remember from Chapter 2, MSE is used as a theoretical measure for various models — we have not picked a model yet, the MSE is used to pick one.
RSS, on the other hand, in the context of Chapter 3, is used to pick coefficients of a model that’s already chosen (in this case, linear model).
I’m not clear yet on a deeper significance than this. That is, I get that they’re used for different purposes, but it’s still not clear to me why. The mathematical relationship between the two seems trivial. Minimizing one will minimize the other.
I feel that I might get a better understanding of this question by reading this PDF here. Something to do over the weekend…