What is Statistical Learning?
Consider a function f that maps X1, X2,…,Xp (also known as independent variables, features, or predictors) to Y (a.k.a.: output, response, or dependent variable). This relationship can be written in a very general form:
Y = f(X) + e
e here is a random error term, which is independent of X, and has a mean of zero.
Statistical learning is a set of approaches for estimating that function f.
Why Estimate f?
Prediction and inference.
Prediction means predicting Y given a set of inputs X, using the estimated f (which for all we know might look very different from the true f, just that it happens to produce an estimate of Y that’s good enough). In other words, we can treat our estimate of f as a black box.
The accuracy of predicting Y depends on two kinds of errors: reducible error and irreducible error. The former is the error that we get because our estimate of f is not perfect. It is reducible because we may be able to find and use better techniques to get ever closer to the true f.
The irreducible error is the e in the equation above. It is irreducible because it’s independent of X, and therefore it will be present even if our estimate of f is perfect! This error may contain unmeasured variables (that is, something that should be an X, but not, because we’re not aware of it), unmeasurable variation, etc.
The focus of statistical learning is minimizing the reducible error. The irreducible error, on the other hand, puts an upper bound whose value is unknown, but whose existence is assured, on the prediction accuracy.
In inference, the focus is not necessarily on predicting Y, but more on how Y changes as X changes. Our estimated f is no longer a black box, because we need to know its form to understand how Y changes in response to X.
For example, we might want to know:
- Which predictors are substantially associated with Y? In a linear equation such as Y = 10 – 23X2+ 0.000001X3, we can say that the last predictor is not substantially associated with Y.
- What is the relationship between the response and each predictor? In the equation above, Y has a positive relationship with X1, and opposite relationship with X2
- Is the relationship between Y and X linear? Or something more complicated?
Depending on the ultimate goal, different methods might be appropriate — a method that is simple and easy to interpret for inference purposes may not yield a prediction that’s as accurate as a method that is not as easily interpretable, and vice versa.