Ten Quotes from Jeremy Howard's Machine Learning MOOC
I recently finished fast.ai’s Introduction to Machine Learning for Coders MOOC. There’s a lot I liked about this course, and I’ll elaborate on that in a separate blog post. But for now, I’ll share some memorable quotes from Jeremy Howard - the course’s instructor - that helped me understand the scope of machine learning, and its current state today.
Note: quotes are chronologically ordered.
-
“Data science is not software engineering. There’s a lot of overlap…but what we’re doing right now is prototyping models.” [Howard, Lesson 1]
Before I got my hands on machine learning, I assumed data science was a sub-discipline of software engineering. Jeremy explains that there are many similarities between data science and software engineering, but data science requires a different approach. Indeed, Jeremy’s workflow reminded me of my experience with building computational models in MATLAB. Efficiently prototyping machine-learning models require a different set of practices than that of conventional programming.
-
“The world of machine learning has become very empirical…the difference between theory and practice is so huge.” [Howard, Lesson 1]
The decade spanning the 90’s is referred to as a period of little practical progress. Jeremy states that theoretically elegant models such as SVMs (support vector machines) dominated academia over the 90’s, but failed to produce pragmatic results.
-
“My rule of thumb is that if something takes more than 10 seconds to run, it’s too long for me to do interactive analysis with it.” [Howard, Lesson 2]
The Jupyter Notebook is not just a teaching tool, but a common data science tool to interactively code. Jeremy personally attested to the efficiency of programming interactively rather than periodically compiling code. The ability to continuously analyze, prototype, and refine data models in real-time is a priority.
-
“The whole point of machine learning is to identify which variables actually matter the most, and how do they relate to each other and your dependent variable together.” [Howard, Lesson 2]
Between over-hyped media coverage and terse technical jargon, Jeremy cuts to the essence of machine learning.
-
“The details are difficult. They’re not difficult like intellectually difficult. They’re kind of difficult in the way that makes you want to head back to your desk at 2 am.” [Howard, Lesson 3]
Jeremy serves a caveat that many of the hardest challenges in machine learning are encountered when trying to properly implement an idea into your model.
-
“If you get a detail wrong, much of the time it’s not going to give you an exception. It will just silently be slightly less good than it otherwise would have been…You just don’t know if your company’s model is like half as good as it could be because you made a little mistake.” [Howard, Lesson 3]
The evaluation metric of a machine learning model is not binary. Given that we lack a true measuring stick in the real world, there will be an inherent level of uncertainty that we need to deal with.
-
“If you don’t have a good validation set, it’s hard - if not impossible - to create a good model.” [Howard, Lesson 3]
This insight reminds us of the limitations of machine learning. A machine learning model will not provide ROI in an environment that it hasn’t been properly trained for.
-
“The vast majority of machine learning models don’t automate anything. They’re designed to provide information to humans.” [Howard, Lesson 6]
Jeremy has rejected the notion of machine learning being a black-box on several occasions. Machine Learning is an augmenting tool used to extract more insight out of the data we have. These machine-learned insights allow us to efficiently focus our attention on the features and details that have the greatest impact.
-
“The best practitioners I know in machine learning all share one particular trait in common, which is they’re very, very tenacious…another thing which is they’re very good coders. They’re very good at turning their ideas into code.” [Howard, Lesson 11]
I imagine that most people would not be surprised to hear tenacity translating to success. But, Jeremy expanded on the feelings of frustration and failure that one should expect to encounter when working on machine learning models. He also emphasized that coding is a compulsory activity in the realm of data science.
-
“The bias in the data creates a bias in the software.” [Howard, Lesson 12]
The course ended with a presentation on the ethical challenges of data science. A series of case studies displayed the unintentional consequences that can result from machine learning models. While there is no clear answer to the ethical issues, it is evident that a model trained with biased data can be dangerous.