logo
episode-header-image
Nov 2024
1h 6m

Build LLMs From Scratch with Sebastian R...

Neil Leiser
About this episode

Our guest today is Sebastian Raschka, Senior Staff Research Engineer at Lightning AI and bestselling book author.

In our conversation, we first talk about Sebastian's role at Lightning AI and what the platform provides. We also dive into two great open source libraries that they've built to train, finetune, deploy and scale LLMs.: pytorch lightning and litgpt.

In the second part of our conversation,  we dig into Sebastian's new book: "Build and LLM from Scratch". We discuss the key steps needed to train LLMs, the differences between GPT-2 and more recent models like Llama 3.1, multimodal LLMs and the future of the field.

If you enjoyed the episode, please leave a 5 star review and subscribe to the AI Stories Youtube channel.

Build a Large Language Model From Scratch Book: https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167

Blog post on Multimodal LLMs: https://magazine.sebastianraschka.com/p/understanding-multimodal-llms

Lightning AI (with pytorch lightning and litgpt repos): https://github.com/Lightning-AI

Follow Sebastian on LinkedIn: https://www.linkedin.com/in/sebastianraschka/

Follow Neil on LinkedIn: https://www.linkedin.com/in/leiserneil/  

---

(00:00) - Intro

(02:27) - How Sebastian got into Data & AI

(06:44) - Regressions and loss functions

(13:32) - Academia to joining LightningAI

(21:14) - Lightning AI VS other cloud providers

(26:14) - Building PyTorch Lightning & LitGPT

(30:48) - Sebastian’s role as Staff Research Engineer

(34:35) - Build an LLM From Scratch

(45:00) - From GPT2 to Llama 3.1

(48:34) - Long Context VS RAG

(56:15) - Multimodal LLMs

(01:03:27) - Career Advice


Up next
Jun 26
Why Data Scientists Don’t Get Hired — And How to Fix It with Dawn Choo #61
Our guest today is Dawn Choo, founder of Interview Master and ex Data Scientist from Amazon and Meta. In our conversation, we first dive into Dawn's past Data Science projects at Amazon and Instagram. She explains how a pet project skyrocketed her career at Amazon and also shares ... Show More
54m 57s
Apr 24
Polars: Fast & Efficient Data Manipulation with Ritchie Vink #60
Our guest today is Ritchie Vink, CEO & Founder of Polars: an open source data manipulation library known for being extremely fast. As of today, polars has over 32k stars on github. In our conversation, Ritchie first explains how Polar which started as a side project evolved to wh ... Show More
42m 46s
Apr 3
How He Developed the World's Best Search Agent with Philippe Mizrahi #59
Our guest is Philippe Mizrahi, CEO of Linkup: a french startup building the world's best search agents. In our conversation, Philippe first shares how he got into search by building an internal dataset search tool at Lyft. We then dive into Linkup where Phil explains how lin ... Show More
56m 29s
Recommended Episodes
Jul 2024
#229 Inside Meta's Biggest and Best Open-Source AI Model Yet with Thomas Scialom, Co-Creator of Llama3
Meta has been at the absolute edge of the open-source AI ecosystem, and with the recent release of Llama 3.1, they have officially created the largest open-source model to date. So, what's the secret behind the performance gains of Llama 3.1? What will the future of open-source A ... Show More
39m 23s
May 4
Inside Devin: The world’s first autonomous AI engineer that's set to write 50% of its company’s code by end of year | Scott Wu (CEO and co-founder of Cognition)
Scott Wu is the co-founder and CEO of Cognition, the company behind Devin—the world’s first autonomous AI software engineer. Unlike other AI coding tools, Devin works like an autonomous engineer that you can interact with through Slack, Linear, and GitHub, just like with a remote ... Show More
1h 32m
Feb 2025
OpenAI researcher on why soft skills are the future of work | Karina Nguyen (Research at OpenAI, ex-Anthropic)
Karina Nguyen leads research at OpenAI, where she’s been pivotal in developing groundbreaking products like Canvas, Tasks, and the o1 language model. Before OpenAI, Karina was at Anthropic, where she led post-training and evaluation work for Claude 3 models, created a document up ... Show More
1h 14m
Mar 2025
Inside Bolt: From near-death to ~$40m ARR in 5 months—one of the fastest-growing products in history | Eric Simons (founder & CEO of StackBlitz)
Eric Simons is the founder and CEO of StackBlitz, the company behind Bolt—the #1 web-based AI coding agent and one of the fastest-growing products in history. After nearly shutting down, StackBlitz launched Bolt on Twitter and exploded from zero to $40 million ARR and 1 million m ... Show More
1h 28m
Dec 2024
The Best of 2024 with Sarah Guo and Elad Gil
2024 has been a year of transformative technological progress, marked by conversations that have reshaped our understanding of AI's evolution and what lies ahead. Throughout the year, Sarah and Elad have had the privilege of speaking with some of the brightest minds in the field. ... Show More
27m 7s
Apr 15
#144: ChatGPT’s New Memory, Shopify CEO’s Leaked “AI First” Memo, Google Cloud Next Releases, o3 and o4-mini Coming Soon & Llama 4’s Rocky Launch
Returning from Google Cloud Next, Paul and Mike are back with some major AI updates. They kick things off with ChatGPT’s new memory feature and unpack what that means for you. Then it’s onto Shopify’s leaked memo: no new hires until AI proves it can’t do the job. Databox takes th ... Show More
1h 31m
May 8
Industry Roundup #4: O3 & O4-mini, LLama 4’s Rocky Release & Google’s Agent Ecosystem
Welcome to DataFramed Industry Roundups! In this series of episodes, Adel & Richie sit down to discuss the latest and greatest in data & AI. In this episode, we touch upon the launch of OpenAI’s O3 and O4-mini models, Meta’s rocky release of Llama 4, Google’s new agent tooling ec ... Show More
44m 14s
May 1
The rise of Cursor: The $300M ARR AI tool that engineers can’t stop using | Michael Truell (co-founder and CEO)
Michael Truell is the co-founder and CEO of Anysphere, the company behind Cursor—the fastest-growing AI code editor in the world, reaching $300 million in annual recurring revenue just two years after its launch. In this conversation, Michael shares his vision for the future, les ... Show More
1h 11m
Jun 22
From ChatGPT to Instagram to Uber: The quiet architect behind the world’s most popular products | Peter Deng
Peter Deng has led product teams at OpenAI, Instagram, Uber, Facebook, Airtable, and Oculus and helped build products used by billions—including Facebook’s News Feed, the standalone Messenger app, Instagram filters, Uber Reserve, ChatGPT, and more. Currently he’s investing in ear ... Show More
1h 55m
Jul 2024
SearchGPT Takes on Google, Meta's AI Clones & The State of AI Investment (Reacts)
Could the launch of SearchGPT by OpenAI and Meta's AI Bots for creators redefine search engines and social media engagement? Discover how these advancements will reshape the tech landscape in this week's Reacts episode, packed with the latest AI updates. Join Ben Parr and ... Show More
50m 22s