On Day 2, I learned about what Statistical Learning is about. That is, estimating f, a function that maps independent variables X to dependent variable Y, for the purpose of either inference, or prediction.
Today we look into more details of the estimation approaches. Broadly speaking, they can be categorized into two, parametric, and non-parametric.
The parametric approach means that before we estimate f, we make an assumption first of its shape. A common assumption is that f is a linear function of X. That is,
f(X) = B0 + B1X1 + B2X2 +…+BpXp
Which simplifies our task a lot, since estimating a potentially arbitrary function f has been reduced to estimating the coefficients of the linear equation.
There are many ways to do this, the most popular being the Ordinary Least Squares, which will be discussed in the next chapter.
Does this approach have disadvantages? You bet it does. Assuming that f is linear might give us a very poor estimate if the true function deviates too much from it.
The approach is exactly what the name suggests. Instead of making assumptions about the shape of f, this approach makes no assumption. This has the advantage of not mismatching the shape of the true function f, however it also has the disadvantage of requiring a far larger number of n (i.e.: observations) compared to the parametric approach.
One of the approaches is something called the thin-plate spline, which will be discussed in one of the later chapters. In this approach, we calibrate something that’s called smoothness. A lower level of smoothness (i.e.: rougher) spline can fit training data perfectly at the risk of overfitting. A higher smoothness may not fit the training data as well, but it has a lower variance (we’ll get to this variance bit soon).