Reading An Introduction To Statistical Learning: Day 2

(This post is part of my 20 Minutes of Reading Every Day project. My current book of choice is An Introduction to Statistical Learning, and today I’m starting with Chapter 2: Statistical Learning.)

What is Statistical Learning?

Consider a function f that maps X1, X2,…,Xp (also known as independent variables, features, or predictors) to Y (a.k.a.: output, response, or dependent variable). This relationship can be written in a very general form:

Y = f(X) + e

e here is a random error term, which is independent of X, and has a mean of zero.

Statistical learning is a set of approaches for estimating that function f. 

Why Estimate f?

Prediction and inference.


Prediction means predicting Y given a set of inputs X, using the estimated f (which for all we know might look very different from the true f, just that it happens to produce an estimate of Y that’s good enough). In other words, we can treat our estimate of f as a black box.

The accuracy of predicting Y depends on two kinds of errors: reducible error and irreducible error. The former is the error that we get because our estimate of f is not perfect. It is reducible because we may be able to find and use better techniques to get ever closer to the true f.

The irreducible error is the e in the equation above. It is irreducible because it’s independent of X, and therefore it will be present even if our estimate of is perfect! This error may contain unmeasured variables (that is, something that should be an X, but not, because we’re not aware of it), unmeasurable variation, etc.

The focus of statistical learning is minimizing the reducible error. The irreducible error, on the other hand, puts an upper bound whose value is unknown, but whose existence is assured, on the prediction accuracy.


In inference, the focus is not necessarily on predicting Y, but more on how Y changes as X changes. Our estimated f is no longer a black box, because we need to know its form to understand how Y changes in response to X.

For example, we might want to know:

  1. Which predictors are substantially associated with Y? In a linear equation such as Y = 10 – 23X2+ 0.000001X3we can say that the last predictor is not substantially associated with Y.
  2. What is the relationship between the response and each predictor? In the equation above, Y has a positive relationship with X1, and opposite relationship with X2
  3. Is the relationship between Y and X linear? Or something more complicated?

Depending on the ultimate goal, different methods might be appropriate — a method that is simple and easy to interpret for inference purposes may not yield a prediction that’s as accurate as a method that is not as easily interpretable, and vice versa.


Reading An Introduction To Statistical Learning: Day 1

(This post is part of my 20 Minutes of Reading Every Day project. My current book of choice is An Introduction to Statistical Learning. I skipped the Preface, and went directly to Chapter 1: Introduction.)

What is Statistical Learning?

Consider a function f that maps X1, X2,…,Xp (also known as independent variables, features, or predictors) to Y (a.k.a.: output, response, or dependent variable). This relationship can be written in a very general form:

I think Chapter 1 can be skipped. The only interesting thing is the premises on which the book is based:

  1. Many statistical learning methods are relevant and useful in a wide range of academic and non-academic disciplines. So the book is presenting the methods that are most widely applicable.
  2. There is a nice balance between knowing the intricate details of every single building blocks of statistical learning, vs. knowing enough to know which ones would works best in which scenarios. While the former might make you better at stat learning, the point is that you don’t have to wait until you’ve reached that before you can create stuff and solve problems. This covers premise 2 and 3.
  3. It comes with lab problems. Which is nice to cement your understanding. I’m not sure if I’ll have the time to do them along with this 20-minute reading project though!

Conventions for Notation

Knowing this in advance will make going through the book easier, so I went through this. Without further ado:

  • n: the number of observations / distinct data points in the sample
  • p: the number of variables available for use in making predictions
  • xij is the value of the jth variable of the ith observation, where i = 1, 2,…, n, and j = 1, 2,…, p.
  • X is then an n * p matrix.
  • xi then represents the ith observation containing the p variable measurements.
  • xj is a vector that contain n values of the jth variable.

20 Minutes of Reading Every Day

One of the most common complaints about books is that we have too many of them, but not enough time to read it. But how much time do you really need to read a book, really? How about 5 minutes per day? Surely we spend more than that just sitting in the toilet every day?

How about 10? How about 15? If you can spend 15 minutes per day on just focusing on reading a book, you’d probably finish a few in one year… unless you’re reading a tome like this one. I don’t know when I’m ever going to finish that book. I stopped about 15% into the book and haven’t got back to it since then.

So a friend of mine came up with a brilliant idea. A bunch of people agreed on allocating 15 minutes per day to read a book – any book, but ideally a “vegetables” book. You know, books that you know you should be reading, but never get around to. There is little value is choosing fast-food books such as the Dresden Files books, because for some reason I never seemed to have any issue finding time to read those.

After 30 days, then we’ll share our findings with the group.

The idea is that 15 minutes is a small enough commitment that most of us would be able to stick to the habit. And after doing that for 30 days, hopefully, will become the “gateway habit”, which cements the foundation upon which to build a healthier and more useful reading habit.

Since I like to be ambitious (read: probably overestimating my own will power reserve), I’d like to try with 20 minutes per day. Plus I’d like to try writing a blog post about my reading on that day too. And the book of my choice is An Introduction to Statistical Learning.

Let’s see how it goes!

Learning Machine Learning – What does “Closed-form Solution” mean, exactly?

“Closed-form solution” is something that I encounter again and again whenever I go a bit deeper into math. It can be a bit frustrating trying to understand what it actually means, because the answer you get either doesn’t help you at all if you’re not a math person, like this one:

An equation is said to be a closed-form solution if it solves a given problem in terms of functions and mathematical operations from a given generally-accepted set. For example, an infinite sum would generally not be considered closed-form. However, the choice of what to call closed-form and what not is rather arbitrary since a new “closed-form” function could simply be defined in terms of the infinite sum.

Right, so it’s arbitrary. Thanks. Or even worse, this one:

A discrete function A(n,k) is called closed form (or sometimes “hypergeometric”) in two variables if the ratiosA(n+1,k)/A(n,k) and A(n,k+1)/A(n,k) are both rational functions. A pair of closed form functions (F,G) is said to be a Wilf-Zeilberger pair if…

Hypergeometric, rational functions, and Wilf-Zeil… let me just stop there. Look, I understand that the above explanations are probably rigorously correct mathematically, I can appreciate it. The thing is that it’s not very useful for me in trying to understand what it actually means!

Instead, let me just attempt to convey the idea using very simple examples, just like the ones used in the Wikipedia page.

The quadratic polynomial

ax^2+bx+c = 0

is closed-form, because its solution

x = \frac{ { - b \pm \sqrt {b^2 - 4ac} }}{{2a}}

can be expressed in finite numbers of elementary operations. By “elementary operations” here we mean subtraction, power, square root, division, and multiplication. Now, in this particular example, I think everyone can agree that the operations are elementary. But for more complex operations/functions, there are no general agreement on when the “elementary” label stops, exactly. This is why they say that the definition of closed-form solution is somewhat arbitrary.

Now, let’s look at an example of non-closed-form expressions. This expression

S = \sum_{i=0}^{\infty} \frac{x}{2^i}

is not in closed form, because even though the operations are simple additions, there’s an infinite number of them. Even the most powerful supercomputer in the universe will never be able to give you the final result of this sum!

But sometimes, a non-closed-form expression can be expressed in the closed form. For example, the infinite sum above can be manipulated into a closed form expression by multiplying the sum by \frac{1}{2} and subtracting the result from the original sum. That is :

S = x + \frac{x}{2} + \frac{x}{4} + \frac{x}{8} + ...

S - \frac{1}{2}S = (x + \frac{x}{2} + \frac{x}{4} + \frac{x}{8} + ...) - (\frac{x}{2} + \frac{x}{4} + \frac{x}{8} + \frac{x}{8} + ...)

\frac{1}{2}S = x

S = 2x , a decidedly closed-form expression.

That’s the value of expressing something in closed-form. Now you don’t even need a calculator to know what the sum is.

The problem is, not every expression has a closed-form equivalent. For example, Galois theory states that unlike quadratic polynomials, fifth (or higher) degree polynomials (that is, quintic, sextic, septic, octic, etc.) have no general closed-form solutions.

That is, we have general formula to find the roots of quadratic polynomials (above), we have the formula for third degree (cubic) polynomials, and we have the formula for fourth degree (quartic) polynomials. But no such thing exists for fifth degree and beyond. So what can we do?

This is where numerical methods come in. Essentially, instead of trying to find the exact solution, numerical methods give us a way to approximate the answer until we’re close enough. As a very simple example, consider the infinite sum we just discussed. How far do you think we have to go to get a result that’s very close to the exact result? Not very far.

Approximating the infinite sum

So, there it is, a post about closed-form solutions. Phew. I think I spent way more time crafting the LaTex expression than I did writing the post!

Learning Machine Learning – Intro & Roadmap

(Last revision: 2014-09-30. Added Foundation Courses section.)

I’ve been having lots of fun with Machine Learning lately. It all started with taking Andrew Ng’s excellent Coursera Machine Learning course. Andrew is a fantastic instructor, with a knack for conveying complex concepts while not getting you bogged in not-yet-relevant details that’d take away from your getting the big picture.

While taking the course, I’ve been creating a lot of draft posts, mostly about digging deeper into topics on which I’m still shaky. Most of the times, this is because of the math involved. Machine Learning requires a lot of math, and it’s on a level that’s way, way deeper than the math you’ll ever see in the CFA curriculum.

(That said, if you’re a CFA candidate or charterholder, don’t let this discourage you! The math you encounter in the CFA curriculum serves as excellent foundation to the math you’ll see in Machine Learning. Here’s a quote from the Preface of Applied Predictive Modeling book: 

For this text, the reader should have some knowledge of basic statistics, including variance, correlation, simple linear regression, and basic hypothesis testing (e.g.: p-values and test statistics).

That’s level 2 all over again, folks! 🙂 )

As part of my learning process, I’d like to create a collection of posts about Machine Learning, with the hope that the process will strengthen my own understanding, and maybe (who knows!) even help someone who’s interested in the topic, but lacking the mathematical tools and a good roadmap to follow to get better at this (i.e.: just like where I am today).


  1. Take Andrew Ng’s Machine Learning Course. (DONE)
  2. Go through Applied Predictive Modeling book, and pick up R language in the process. (ONGOING)
  3. (To be determined)

Math Basics


Foundation Courses

  • Probabilistic Systems Analysis and Applied Probability. I wish there were something like this when I was still in school/university back then (yes, that was a long time ago). This is an excellent course that you can take at your own pace, and it’s as good as it gets, complete with quizzes, practice problems, and solutions.

Machine Learning Courses

Machine Learning Books

  • Applied Predictive Modeling. I’m currently reading this book. I bought it because it’s (1) an introductory and (2) a very hands-on book (at least the reviews say so).
  • An Introduction to Statistical Learning: 103. A book with equally glowing reviews in Amazon. I picked Applied Predictive Modeling first because (again, from the reviews) it provides a more complete coverage of the entire process.
  • The Elements of Statistical Learning. Available in its entirety in PDF. The math is a bit too advanced for me though. There are just a bit too many foreign words for me to be distracting. I can persevere and look the words up one by one, but I suspect I’d have a more productive time with a gentler book (hence my choice to go with the Applied Predictive Modeling book).
  • Pattern Recognition and Machine Learning. Another book of which many people speak very highly. I did not pick this book because it’s not available in soft copy (unlike the AML book, which has a Kindle version that’s supposedly an exact replica of the printed book), and most importantly, I get the sense that it’s more advanced than the AML book.
  • (To be updated)

2014: The Year of Being Gentle To Yourself

How many of you even remember what your 2013 resolutions were?

I remember that I didn’t make any special resolutions in 2012.

Because, although a new year surely feels like you’re closing a chapter of your life and starting a new one, it’s much easier to fall back into old harmful patterns in mid January, than keep doing whatever uncomfortable new resolutions you made in December 31st.

And I think that point about the new resolutions being uncomfortable is key. That’s partly the reason why resolutions are so hard.

We tend to think that when the clock turned to 00:00, January 1st 2014, we became this new person living a new chapter in a book (or a new episode in our “My Life” TV series). But the fact is that we are still we. If we find it impossibly difficult to wake up at 5AM and jog every morning in 2013, we’ll find it just as equally impossible to do it in 2014 as well. The year changes when the last second ends, but we are still who we are.

Does it mean changing for the better is impossible? Of course not. But unless one experiences a truly life-changing episode, I don’t think an instant change is possible for most of us. Change is gradual, and for most (if not all) of the time, happens because we’re either looking for pleasure, or avoiding pain.

That’s why resolutions such as “run 5km every single day at 5AM” are probably not going to stick. It’s so painful! Your brain registers pain, pain, pain, and soon it’ll go back to sleeping until 8AM. I think, the key to lasting change is being gentle with your brain: push yourself hard enough that you experience pleasure, but no more. That way your brain registers your new habit as a pleasurable thing to do, and not a painful activity to avoid.

That’s why, while I don’t have any resolutions this year, I promised myself I am going to take it easy. I’m going to continue improving my exercise routine. I’m going to continue improving my brain exercise routine. I’m going to continue improving my diet.

But gone are the days of “starting from January 1st, I’m going to <insert drastic painful changes>”.

Because that doesn’t work. What works are gentle, gradual changes. Here’s to a gentler, but consequently more productive 2014!

Tiny Baby Steps to Good Habits

Installing a new habit is hard. Really hard. Especially, it seems, when it’s a good habit.

Bad habits? Damn, we don’t even have to try, do we? Habits that make you hate yourself after you’ve done it for the millionth time… those are the ones that are the easiest to stick forever.

I don’t know about you, I’ve tried the 18, 21, 28, 30-day program before, and they didn’t work. While I was able to stick to the habit for 18, 21, 28, 30, whatever days… after that I simply stopped doing it.

Some of you (yes, you, one of the 3 or 4 people who are read this blog) know that I’ve put up USD3500 into a commitment to blog about CFA level 3 exam material regularly. The idea is that doing this helps ensuring that I’m familiar with the material and I start preparing early.

But it’s hard. It’s hard to make the habit stick. I have the material in my iPad, and when I open my iPad after a hard day of work, what do I do? I open a Marvel comic. Is it a matter of willpower? Is it a matter of motivation? Maybe and maybe.

Which is why I found this approach so interesting. The idea is that you start with such tiny baby steps, that willpower and motivation don’t even come into the picture. If you can’t even take these tiny baby steps, then you might want to ask yourself again whether you really want the habit in the first place!

We will see. I have joined the Sept 10-14 session — let’s see if it works!