logo
episode-header-image
Feb 2025
1h 7m

Accelerating AI Training and Inference w...

Sam Charrington
About this episode

Today, we're joined by Ron Diamant, chief architect for Trainium at Amazon Web Services, to discuss hardware acceleration for generative AI and the design and role of the recently released Trainium2 chip. We explore the architectural differences between Trainium and GPUs, highlighting its systolic array-based compute design, and how it balances performance across key dimensions like compute, memory bandwidth, memory capacity, and network bandwidth. We also discuss the Trainium tooling ecosystem including the Neuron SDK, Neuron Compiler, and Neuron Kernel Interface (NKI). We also dig into the various ways Trainum2 is offered, including Trn2 instances, UltraServers, and UltraClusters, and access through managed services like AWS Bedrock. Finally, we cover sparsity optimizations, customer adoption, performance benchmarks, support for Mixture of Experts (MoE) models, and what’s next for Trainium.


The complete show notes for this episode can be found at https://twimlai.com/go/720.

Up next
Aug 19
Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743
Today, we're joined by Jack Parker-Holder and Shlomi Fruchter, researchers at Google DeepMind, to discuss the recent release of Genie 3, a model capable of generating “playable” virtual worlds. We dig into the evolution of the Genie project and review the current model’s scaled-u ... Show More
1h 1m
Aug 12
Closing the Loop Between AI Training and Inference with Lin Qiao - #742
In this episode, we're joined by Lin Qiao, CEO and co-founder of Fireworks AI. Drawing on key lessons from her time building PyTorch, Lin shares her perspective on the modern generative AI development lifecycle. She explains why aligning training and inference systems is essentia ... Show More
1h 1m
Jul 29
Context Engineering for Productive AI Agents with Filip Kozera - #741
In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection lo ... Show More
46m 1s
Recommended Episodes
Oct 2017
Data science tools and other announcements from Ignite
In this episode, Microsoft's Corporate Vice President for Cloud Artificial Intelligence, Joseph Sirosh, joins host Kyle Polich to share some of the Microsoft's latest and most exciting innovations in AI development platforms. Last month, Microsoft launched a set of three powerful ... Show More
31m 40s
Jan 2025
#229 Mitesh Agrawal: Why Lambda Labs’ AI Cloud Is a Game-Changer for Developers
This episode is sponsored by Netsuite by Oracle, the number one cloud financial system, streamlining accounting, financial management, inventory, HR, and more.   NetSuite is offering a one-of-a-kind flexible financing program. Head to  https://netsuite.com/EYEONAI to know more.  ... Show More
56m 7s
Oct 2024
#692: A Discussion About Serverless and How to Make the Most of It
Simon is joined by Stephen Liedig to discuss the evolution of serverless technology and its impact on application development, exploring benefits like scalability, cost optimization, and faster dev cycles. They delve into key services and concepts in serverless design, including ... Show More
35m 28s
Nov 2022
Kubernetes on Vessels, with Louis Bailleul
Louis Bailleul is a Chief Enterprise Architect at PGS. After years of running highly-ranked super computers to process PGS’ seismic data, Louis’s team at PGS has lead a transition to Google Cloud. Listen in to learn about HPC in Google Cloud with GKE, and to explore using Kuberne ... Show More
42m 56s
Dec 2024
AI Semiconductor Landscape feat. Dylan Patel | BG2 w/ Bill Gurley & Brad Gerstner
Open Source bi-weekly convo w/ Bill Gurley and Brad Gerstner on all things tech, markets, investing & capitalism. This week they are joined by Dylan Patel, Founder & Chief Analyst at SemiAnalysis, to discuss origins of SemiAnalysis, Google's AI workload, NVIDIA's competitive edge ... Show More
1h 29m
Apr 2024
Episode 192 - Google Cloud Next 2024 Recap
Join Allen Firstenberg and guest host Stefania Pecore on Two Voice Devs as they delve into the exciting announcements and highlights from Google Cloud Next 2024! This episode focuses on the latest advancements in AI and their impact on the healthcare industry, providing valuable ... Show More
40m 35s
Mar 2025
NVIDIA RAPIDS and Open Source ML Acceleration with Chris Deotte and Jean-Francois Puget
NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries. It leverages CUDA and significantly enhances the performance of core Python frameworks including Polars, pandas, scikit-learn and NetworkX. Chris Deotte is a Senior Data Scientist at NVIDIA an ... Show More
42m 6s
Oct 2024
Dylan Patel & Jon (Asianometry) – How the Semiconductor Industry Actually Works
A bonanza on the semiconductor industry and hardware scaling to AGI by the end of the decade.Dylan Patel runs Semianalysis, the leading publication and research firm on AI hardware. Jon Y runs Asianometry, the world’s best YouTube channel on semiconductors and business history.* ... Show More
2h 9m
Nov 2024
NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bets
In this week’s episode of No Priors, Sarah and Elad sit down with Jensen Huang, CEO of NVIDIA, for the second time to reflect on the company’s extraordinary growth over the past year. Jensen discusses AI’s takeover of datacenters and NVIDIA’s rapid development of x.AI’s superclus ... Show More
36m 53s