Reading An Introduction To Statistical Learning: Day 2

(This post is part of my 20 Minutes of Reading Every Day project. My current book of choice is An Introduction to Statistical Learning, and today I’m starting with Chapter 2: Statistical Learning.)

What is Statistical Learning?

Consider a function f that maps X1, X2,…,Xp (also known as independent variables, features, or predictors) to Y (a.k.a.: output, response, or dependent variable). This relationship can be written in a very general form:

Y = f(X) + e

e here is a random error term, which is independent of X, and has a mean of zero.

Statistical learning is a set of approaches for estimating that function f. 

Why Estimate f?

Prediction and inference.


Prediction means predicting Y given a set of inputs X, using the estimated f (which for all we know might look very different from the true f, just that it happens to produce an estimate of Y that’s good enough). In other words, we can treat our estimate of f as a black box.

The accuracy of predicting Y depends on two kinds of errors: reducible error and irreducible error. The former is the error that we get because our estimate of f is not perfect. It is reducible because we may be able to find and use better techniques to get ever closer to the true f.

The irreducible error is the e in the equation above. It is irreducible because it’s independent of X, and therefore it will be present even if our estimate of is perfect! This error may contain unmeasured variables (that is, something that should be an X, but not, because we’re not aware of it), unmeasurable variation, etc.

The focus of statistical learning is minimizing the reducible error. The irreducible error, on the other hand, puts an upper bound whose value is unknown, but whose existence is assured, on the prediction accuracy.


In inference, the focus is not necessarily on predicting Y, but more on how Y changes as X changes. Our estimated f is no longer a black box, because we need to know its form to understand how Y changes in response to X.

For example, we might want to know:

  1. Which predictors are substantially associated with Y? In a linear equation such as Y = 10 – 23X2+ 0.000001X3we can say that the last predictor is not substantially associated with Y.
  2. What is the relationship between the response and each predictor? In the equation above, Y has a positive relationship with X1, and opposite relationship with X2
  3. Is the relationship between Y and X linear? Or something more complicated?

Depending on the ultimate goal, different methods might be appropriate — a method that is simple and easy to interpret for inference purposes may not yield a prediction that’s as accurate as a method that is not as easily interpretable, and vice versa.


One thought on “Reading An Introduction To Statistical Learning: Day 2

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s