logo
episode-header-image
Feb 2017
34m 29s

MLG 005 Linear Regression

OCDevel
About this episode

Linear regression is introduced as the foundational supervised learning algorithm for predicting continuous numeric values, using cost estimation of Portland houses as an example. The episode explains the three-step process of machine learning - prediction via a hypothesis function, error calculation with a cost function (mean squared error), and parameter optimization through gradient descent - and details both the univariate linear regression model and its extension to multiple features.

Links

Linear Regression

Overview of Machine Learning Structure

  • Machine learning is a branch of artificial intelligence, alongside statistics, operations research, and control theory.
  • Within machine learning, supervised learning involves training with labeled examples and is further divided into classification (predicting discrete classes) and regression (predicting continuous values).

Linear Regression and Problem Framing

  • Linear regression is the simplest and most commonly taught supervised learning algorithm for regression problems, where the goal is to predict a continuous number from input features.
  • The episode example focuses on predicting the cost of houses in Portland, using square footage and possibly other features as inputs.

The Three Steps of Machine Learning in Linear Regression

  • Machine learning in the context of linear regression follows a standard three-step loop: make a prediction, measure how far off the prediction is, and update the prediction method to reduce mistakes.
  • Predicting uses a hypothesis function (also called objective or estimate) that maps input features to a predicted value.

The Hypothesis Function

  • The hypothesis function is a formula that multiplies input features by coefficients (weights) and sums them to make a prediction; in mathematical terms, for one feature, it is: h(x) = theta_1 * x_1 + theta_0
    • Here, theta_1 is the weight for the feature (e.g., square footage), and theta_0 is the bias (an average baseline).
  • With only one feature, the model tries to fit a straight line to a scatterplot of the input feature versus the actual target value.

Bias and Multiple Features

  • The bias term acts as the starting value when all features are zero, representing an average baseline cost.
  • In practice, using only one feature limits accuracy; including more features (like number of bedrooms, bathrooms, location) results in multivariate linear regression: h(x) = theta_0 + theta_1 * x_1 + theta_2 * x_2 + ... for each feature x_n.

Visualization and Model Fitting

  • Visualizing the problem involves plotting data points in a scatterplot: feature values on the x-axis, actual prices on the y-axis.
  • The goal is to find the line (in the univariate case) that best fits the data, ideally passing through the "center" of the data cloud.

The Cost Function (Mean Squared Error)

  • The cost function, or mean squared error (MSE), measures model performance by averaging squared differences between predictions and actual labels across all training examples.
  • Squaring ensures positive and negative errors do not cancel each other, and dividing by twice the number of examples (2m) simplifies the calculus in the next step.

Parameter Learning via Gradient Descent

  • Gradient descent is an iterative algorithm that uses calculus (specifically derivatives) to find the best values for the coefficients (thetas) by minimizing the cost function.
  • The cost function's surface can be imagined as a bowl in three dimensions, where each point represents a set of parameter values and the height represents the error.
  • The algorithm computes the slope at the current set of parameters and takes a proportional step (controlled by the learning rate alpha) toward the direction of the steepest decrease.
  • This process is repeated until reaching the lowest point in the bowl, where error is minimized and the model best fits the data.
  • Training will not produce a perfect zero error in practice, but it will yield the lowest achievable average error for the data given.

Extension to Multiple Variables

  • Multivariate linear regression extends all concepts above to datasets with multiple input features, with the same process for making predictions, measuring error, and performing gradient descent.
  • Technical details are essentially the same though visualization becomes complex as the number of features grows.

Essential Learning Resources

  • The episode strongly directs listeners to the Andrew Ng course on Coursera as the primary recommended starting point for studying machine learning and gaining practical experience with linear regression and related concepts.
Up next
Feb 2017
MLG 006 Certificates & Degrees
<div> <p>People interested in machine learning can choose between self-guided learning, online certification programs such as MOOCs, accredited university degrees, and doctoral research, with industry acceptance and personal goals influencing which path is most appropriate. Indus ... Show More
16m 28s
Feb 2017
MLG 007 Logistic Regression
<div> <p>The logistic regression algorithm is used for classification tasks in supervised machine learning, distinguishing items by class (such as "expensive" or "not expensive") rather than predicting continuous numerical values. Logistic regression applies a sigmoid or logistic ... Show More
35m 8s
Feb 2017
MLG 008 Math for Machine Learning
<div> <p>Mathematics essential for machine learning includes linear algebra, statistics, and calculus, each serving distinct purposes: linear algebra handles data representation and computation, statistics underpins the algorithms and evaluation, and calculus enables the optimiza ... Show More
28m 12s
Recommended Episodes
Apr 2017
Feature Processing for Text Analytics
It seems like every day there's more and more machine learning problems that involve learning on text data, but text itself makes for fairly lousy inputs to machine learning algorithms.  That's why there are text vectorization algorithms, which re-format text data so it's ready f ... Show More
17m 28s
Jun 2020
Rust and machine learning #4: practical tools (Ep. 110)
<p>In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly.</p> <p>To make a comparison with the Python ecos ... Show More
24m 18s
Mar 2023
The Startup World in Generative AI! With Or Gorodissky, VP of R&D at D-ID - What's AI Podcast Episode 6
<p>This is an interview with <a href="https://www.linkedin.com/in/orgoro" target="_blank" rel="noopener noreferer">Or Gorodissky</a>, VP of R&amp;D at an amazing generative AI startup called D-ID.</p> <p>This was <strong>my first-ever interview</strong>! I hope you will still enj ... Show More
1h 2m
Jul 2019
Episode 67: Classic Computer Science Problems in Python
<p>Today I am with David Kopec, author of Classic Computer Science Problems in Python, published by Manning Publications.</p> <p>His book deepens your knowledge of problem solving techniques from the realm of computer science by challenging you with interesting and realistic scen ... Show More
28m 35s
Sep 2024
machine learning (noun) [Word Notes]
Enjoy this special encore episode. A programming technique where the developer doesn't specify each step of the algorithm in code, but instead teaches the algorithm to learn from the experience. 
6m 16s
Nov 2024
SE Radio 641: Catherine Nelson on Machine Learning in Data Science
<p><strong>Catherine Nelson</strong>, author of the new O'Reilly book, <em data-renderer-mark="true">Software Engineering for Data Scientists</em>, discusses the collaboration between data scientists and software engineers -- an increasingly common pairing on machine learning and ... Show More
48m 19s
Oct 2020
How AI Consultants Find Projects and Opportunities - with Charles Martin of Calculation Consulting
Today I'm excited and honored to welcome back a long-time mentor who I really respect, the great a brilliant Dr. Charles Martin. Doc Martin received his Ph.D. in Theoretical Chemical Physics from the University of Chicago, where he authored scientific programs in C and Fortran us ... Show More
54m 28s
Jul 2025
Revolutionizing Python Notebooks with Marimo
SummaryIn this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. ... Show More
51m 56s
Aug 26
From Academia to Industry: Bridging Data Engineering Challenges
SummaryIn this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and ... Show More
50m 54s
Apr 2015
Starting Simple and Machine Learning in Meds
In episode nine we talk with George Dahl, of  the University of Toronto, about his work on the Merck molecular activity challenge on kaggle and speech recognition. George recently successfully defended his thesis at the end of March 2015. (Congrats George!) We learn about how net ... Show More
38m 24s